Code repository for the book "50 ML projects to understand LLMs: Investigate transformer mechanisms through data analysis, visualization, and experimentation" by Mike X Cohen, PhD.
This repository contains all Python code and Jupyter notebooks for the 50 projects in the book. Each project includes:
- Helper notebook: Incomplete code to work through the projects yourself (hints and guidance are in the book).
- Solutions notebook: Complete, working implementation corresponding to the detailed explanations in the book.
All code runs on Google Colab, so you don't need to install anything locally or manage library configurations.
Learn how LLMs like GPT and BERT actually work by applying machine learning techniques to their internal activations. This book takes a unique approach: rather than building LLMs from scratch or using them via APIs, you'll investigate their mechanisms by treating hidden states, attention patterns, and embeddings as data to analyze.
Through 50 hands-on projects, you'll learn to:
- Inspect and visualize transformer internals
- Analyze attention mechanisms and layer dynamics
- Apply statistical and causal methods to understand model behavior
- Manipulate activations to test hypotheses about LLM mechanisms
Each project teaches skills in three areas: machine learning techniques, LLM mechanisms, and Python coding with data visualization.
Check out the spreadsheet for detailed info
Check the pdf for table of contents and chapter 1 (introductions).
| Format | Link |
|---|---|
| Paperback | amazon |
| Gumtree |
No installation required — all notebooks run directly in Google Colab.
- Python programming experience (beginner to intermediate level)
- Basic familiarity with machine learning concepts (helpful but not required)
- Curiosity about how LLMs work
The book introduces ML techniques as needed, so you don't need to be an ML expert to get started.
If you use this code in your research or projects, please cite this GitHub url
This code is released under the MIT License. See LICENSE file for details.
Mike X Cohen, PhD is a former neuroscience professor, full-time educator, and Udemy bestselling instructor with 25 years of experience teaching machine learning, mathematics, and data science.
Other books by Mike X Cohen:
- Linear Algebra: Theory, Intuition, Code
- Modern Statistics: Intuition, Math, Python, R
- Calculus Unraveled: Intuition, Proofs, and Python
For questions about the code or book content, please open an issue in this repository.