Single-file transformer in PyTorch. ~230 lines.
pip install torchpython main.pyTrains a small model to memorize a sequence. You should see the loss drop and the model reproduce the sequence perfectly.
from transformer import TransformerConfig, Transformer
config = TransformerConfig()
model = Transformer(config)
model.init_weights()
logits = model(x) # forward pass
loss = model(x, targets) # training
model.genrate([1, 2, 3], max_tokens=50) # generation