esp32gpt

A Rust port of Karpathy's microgpt that trains and runs inference entirely on an ESP32.

The model learns to generate human-like names from scratch — no pre-trained weights, no cloud API, just 4,192 parameters training on a microcontroller.

step    0/1000: loss = 3.3071
step  100/1000: loss = 2.4193
step  500/1000: loss = 1.9888
step  999/1000: loss = 2.0980
--- Generated names (temperature=0.5) ---
  arona, raeli, cealin, malie, sunaya, arishel, mosile ...

Architecture

A 1-layer GPT transformer matching the original Python implementation:


Parameters	4,192 (16.4 KB)
Embedding dim	16
Attention heads	4
Layers	1
Block size	16
Vocab	27 tokens (a-z + BOS)
Normalization	RMSNorm (initial + pre-attention + pre-MLP, no learnable params)
Optimizer	Adam (lr=0.01, beta1=0.85, beta2=0.99)
Training	1,000 steps on 32K names

Why not just port the autograd engine?

The Python microgpt uses a scalar-level autograd (one graph node per multiply/add). For a single forward pass, this creates ~30K-50K nodes consuming 1-2 MB — more than the ESP32's entire 520 KB of SRAM.

Instead, this port uses explicit matrix-level forward and backward passes, storing only the activations needed for backprop (~25 KB). The backward pass is hand-derived and verified against numerical gradients.

Memory budget


Model parameters	17 KB
Gradients	17 KB
Adam state (m + v)	34 KB
Activation cache	25 KB
Dataset (in flash, not RAM)	0 KB
Total SRAM	~100 KB of ~300 KB available

Project structure

src/
  main.rs         Training loop + inference entry point
  model.rs        GPT forward pass, parameter layout, KV cache
  backward.rs     Manual backward pass with gradient accumulation
  optimizer.rs    Adam optimizer
  tensor.rs       Vector-matrix math, RMSNorm
  tokenizer.rs    Character-level encode/decode (a-z + BOS)
  rng.rs          Xorshift32 PRNG + Box-Muller for Gaussian init
data/
  names.txt       32K training names (embedded in flash via include_str!)

~1,000 lines of Rust. All core logic is platform-independent and testable on the host.

Prerequisites

Rust ESP32 toolchain (espup install)
espflash for flashing (cargo install espflash)
An ESP32 dev board (any ESP32-WROOM-32 variant)

Usage

# Run tests on host
make test

# Build for ESP32
make build

# Flash and monitor serial output
make flash

# Just monitor (already flashed)
make monitor

Running on host (no ESP32 needed)

The project compiles and runs natively for development:

RUST_LOG=info RUSTUP_TOOLCHAIN=stable cargo run --target aarch64-apple-darwin

How it works

Training: For each of 1,000 steps, a random name is sampled from the dataset, encoded as tokens, and fed through the transformer. The cross-entropy loss is backpropagated through every operation — attention, FFN, embeddings — and Adam updates the weights.

Inference: Starting from the BOS token, the model autoregressively samples one character at a time (with temperature scaling) until it produces another BOS or hits the block size limit.

The hard part is the attention backward pass: position t's query attends to all keys/values at positions 0..t, so key and value gradients accumulate contributions from every future position. Processing backward through the sequence ensures each position's KV gradients are complete before they're used.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.cargo		.cargo
data		data
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.rs		build.rs
partitions.csv		partitions.csv
sdkconfig.defaults		sdkconfig.defaults

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

esp32gpt

Architecture

Why not just port the autograd engine?

Memory budget

Project structure

Prerequisites

Usage

Running on host (no ESP32 needed)

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

esp32gpt

Architecture

Why not just port the autograd engine?

Memory budget

Project structure

Prerequisites

Usage

Running on host (no ESP32 needed)

How it works

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages