A minimal yet effective implementation of GPT-2 in PyTorch. This project loads pretrained GPT-2 weights and enables text generation while providing insights into the model’s underlying architecture. It includes fine-tuning capabilities on the Stanford Alpaca dataset for instruction following.

Source: ResearchGate
model.py: Core GPT-2 model implementationMHA.py: Multi-head attention with flash attention for speedTransformerBlock.py: Transformer block implementation with pre-layernormconfig.py: Configuration settings matching gpt2-small (12 layers, 768 dim, 12 heads)load_weights.py: Maps Hugging Face GPT-2 weights to this implementationutils.py: Helper functions for tokenization and text generationtest.py: Test the model with different prompts
dataset.py: Dataset loading and processing for Stanford Alpacatrain.py: Training loop and evaluation functionsutils.py: Formatting functions for instruction data and visualization toolsconfig.py: Training configuration with hyperparameters and dataset settings
The project implements the complete GPT-2 architecture from scratch:
- Loads official Hugging Face GPT-2 weights via a precise mapping process
- Uses tiktoken for efficient tokenization compatible with GPT models
- Includes full transformer implementation with multi-head attention
- Generates text with temperature and top-k sampling capabilities
The fine-tuning implementation includes:
- Automatic downloading and preprocessing of the Stanford Alpaca dataset
- Custom formatting of inputs as instruction-following examples
- A complete training pipeline with:
- Gradient accumulation for larger effective batch sizes
- Validation evaluation during training
- Learning rate scheduling
- WandB integration for experiment tracking
- Sample generation during training to monitor progress
- Text generation based on from the fine-tuned model
- Clone the repo:
git clone https://github.com/aashu-0/FineTuning_GPT2.git- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate- Install the dependencies:
pip install -r requirements.txt# quick test with pretrained weights and different prompts
python -m base_model.testThis will run the model with several test prompts using different temperatures and sampling methods.
# Download and prepare the dataset
python -m fine_tune.dataset
# Fine-tune the model
python -m fine_tune.trainThe training script will:
- Download the Alpaca dataset
- Create a manageable subset for faster experimentation
- Format the data for instruction following
- Fine-tune the model for 1 epoch while tracking progress with WandB
- Generate sample responses during training to show improvement
- Save the final model weights
try your own prompts by modifying base_model/test.py or importing the model directly:
from base_model.load_weights import load_gpt2_weights_to_model
from base_model.utils import text_to_token_ids, token_ids_to_text, generate
from base_model.config import GPT2Config
import tiktoken
config = GPT2Config()
model = load_gpt2_weights_to_model(config)
tokenizer = tiktoken.get_encoding('gpt2')
output_ids = generate(
model=model,
idx=text_to_token_ids("your prompt here", tokenizer),
max_new_tokens=30,
context_size=config.context_length,
temp=0.7,
top_k=40
)
print(token_ids_to_text(output_ids, tokenizer))import torch
from base_model.model import GPTModel
from base_model.config import GPT2Config
from base_model.utils import text_to_token_ids, token_ids_to_text, generate
import tiktoken
# Load your fine-tuned model
config = GPT2Config()
model = GPTModel(config)
model.load_state_dict(torch.load("gpt2_finetuned.pt"))
tokenizer = tiktoken.get_encoding('gpt2')
# Format prompt using Alpaca-style instruction format
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Explain quantum computing in simple terms.
### Response:
"""
# Generate response
output_ids = generate(
model=model,
idx=text_to_token_ids(prompt, tokenizer),
max_new_tokens=150,
context_size=config.context_length,
temp=0.7,
top_k=40
)
print(token_ids_to_text(output_ids, tokenizer))- instruction fine-tuning gpt2 on Stanford alpaca dataset
- implementing lora for efficient training
- evaluating fine-tuned model
- explore various optimization techniques
- model quantization for faster inference
I'm still learning, so there might be some bugs or stuff I got wrong.