GPT-2: From Scratch, Pre-trained Loading & Instruction Fine-Tuning

A minimal yet effective implementation of GPT-2 in PyTorch. This project loads pretrained GPT-2 weights and enables text generation while providing insights into the model’s underlying architecture. It includes fine-tuning capabilities on the Stanford Alpaca dataset for instruction following.

Source: ResearchGate

File Info

Base Model

model.py: Core GPT-2 model implementation
MHA.py: Multi-head attention with flash attention for speed
TransformerBlock.py: Transformer block implementation with pre-layernorm
config.py: Configuration settings matching gpt2-small (12 layers, 768 dim, 12 heads)
load_weights.py: Maps Hugging Face GPT-2 weights to this implementation
utils.py: Helper functions for tokenization and text generation
test.py: Test the model with different prompts

Fine-tuning

dataset.py: Dataset loading and processing for Stanford Alpaca
train.py: Training loop and evaluation functions
utils.py: Formatting functions for instruction data and visualization tools
config.py: Training configuration with hyperparameters and dataset settings

TL;DR

Base Model

The project implements the complete GPT-2 architecture from scratch:

Loads official Hugging Face GPT-2 weights via a precise mapping process
Uses tiktoken for efficient tokenization compatible with GPT models
Includes full transformer implementation with multi-head attention
Generates text with temperature and top-k sampling capabilities

Fine-tuning on Instructions

The fine-tuning implementation includes:

Automatic downloading and preprocessing of the Stanford Alpaca dataset
Custom formatting of inputs as instruction-following examples
A complete training pipeline with:

Gradient accumulation for larger effective batch sizes
Validation evaluation during training
Learning rate scheduling
WandB integration for experiment tracking
Sample generation during training to monitor progress

Text generation based on from the fine-tuned model

Setup

Clone the repo:

git clone https://github.com/aashu-0/FineTuning_GPT2.git

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install the dependencies:

pip install -r requirements.txt

Usage

Base Model with Pre-trained Weights

# quick test with pretrained weights and different prompts
python -m base_model.test

This will run the model with several test prompts using different temperatures and sampling methods.

Fine-tuning on Alpaca Dataset

# Download and prepare the dataset
python -m fine_tune.dataset

# Fine-tune the model
python -m fine_tune.train

The training script will:

Download the Alpaca dataset
Create a manageable subset for faster experimentation
Format the data for instruction following
Fine-tune the model for 1 epoch while tracking progress with WandB
Generate sample responses during training to show improvement
Save the final model weights

Custom

Base Model

try your own prompts by modifying base_model/test.py or importing the model directly:

from base_model.load_weights import load_gpt2_weights_to_model
from base_model.utils import text_to_token_ids, token_ids_to_text, generate
from base_model.config import GPT2Config
import tiktoken

config = GPT2Config()
model = load_gpt2_weights_to_model(config)
tokenizer = tiktoken.get_encoding('gpt2')

output_ids = generate(
    model=model,
    idx=text_to_token_ids("your prompt here", tokenizer),
    max_new_tokens=30,
    context_size=config.context_length,
    temp=0.7,
    top_k=40
)

print(token_ids_to_text(output_ids, tokenizer))

Fine-Tuned Model Usage

import torch
from base_model.model import GPTModel
from base_model.config import GPT2Config
from base_model.utils import text_to_token_ids, token_ids_to_text, generate
import tiktoken

# Load your fine-tuned model
config = GPT2Config()
model = GPTModel(config)
model.load_state_dict(torch.load("gpt2_finetuned.pt"))
tokenizer = tiktoken.get_encoding('gpt2')

# Format prompt using Alpaca-style instruction format
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Explain quantum computing in simple terms.

### Response:
"""

# Generate response
output_ids = generate(
    model=model,
    idx=text_to_token_ids(prompt, tokenizer),
    max_new_tokens=150,
    context_size=config.context_length,
    temp=0.7,
    top_k=40
)

print(token_ids_to_text(output_ids, tokenizer))

soon

instruction fine-tuning gpt2 on Stanford alpaca dataset
implementing lora for efficient training
evaluating fine-tuned model
explore various optimization techniques
model quantization for faster inference

note

I'm still learning, so there might be some bugs or stuff I got wrong.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
assets		assets
base_model		base_model
fine_tune		fine_tune
loss_curves		loss_curves
quantization		quantization
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2: From Scratch, Pre-trained Loading & Instruction Fine-Tuning

File Info

Base Model

Fine-tuning

TL;DR

Base Model

Fine-tuning on Instructions

Setup

Usage

Base Model with Pre-trained Weights

Fine-tuning on Alpaca Dataset

Custom

Base Model

Fine-Tuned Model Usage

soon

note

acknowledgments and references

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPT-2: From Scratch, Pre-trained Loading & Instruction Fine-Tuning

File Info

Base Model

Fine-tuning

TL;DR

Base Model

Fine-tuning on Instructions

Setup

Usage

Base Model with Pre-trained Weights

Fine-tuning on Alpaca Dataset

Custom

Base Model

Fine-Tuned Model Usage

soon

note

acknowledgments and references

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages