Handmade CNN for SVHN Digit Classification (PyTorch)

Modular Deep Learning project using Pytorch for classification of Street View House Numbers dataset. Built using custom CNN. Contains training scripts, notebooks, figures, reports and final model's parameters
Accuracy achieved: 0.9543

TLDR

Chose to build a custom CNN model instead of tuning pretrained ones. Chose the SVHN dataset to work on for the project. No rigorous EDA was necessary, since dataset contained only images. Used data augmentation during training. Tuned hyperparameters a lot using Optuna. Spent a few hours training tuned model. Then saved model and checkpoints, wrote training scripts and organised remaining project structure.

Objective

Build a CNN from scratch for multi-class image classification
Face, understand and debug common deep learning pipeline issues
Implement a stable and correct training + validation pipeline
Perform extensive tuning of hyperparameters using Optuna
Improve model performance through architecture and data pipeline refinement

Dataset

SVHN (Street View House Numbers)

Real-world digit classification dataset (0–9)
Images of size 32×32 (resized to 28×28 during preprocessing)

Dataset source: http://ufldl.stanford.edu/housenumbers/

Note: Dataset is not included in the repository.
If using torchvision.datasets.SVHN, then provide root='data' when creating instance of SHVN. If using provided link, place .mat files inside the data/ directory.

Approach

General Approach

Build a baseline CNN and ensure correct training behavior
Debug issues related to shapes, channels, and loss computation
Implement proper train/validation split and data augmentation for train split
Tune hyperparameters using Optuna
Implement better architecture (BatchNorm, Dropout, Adaptive Pooling) and optimizers obtained through tuning
Final training with early stopping and checkpointing

Data Handling

Used train split for training + validation
Used extra split to increase training data
Validation set created from train split (no augmentation)
Proper transform separation:
- Train: augmentation + normalization
- Validation/Test: normalization only

Data Augmentation

Used albumentations library for augmentation. Used augmentations include:

Random cropping with padding
Color jitter (brightness, contrast, saturation)
Very slight blur and rotations
No heavy rotations to preserve digit structure

Model

Custom CNN with:

Conv → ReLU → MaxPool → BatchNorm blocks
Progressive channel increase
MaxPooling for spatial reduction
AdaptiveAvgPool2d to remove dependency on input size
Fully connected classifier with dropout

Training

Loss: CrossEntropyLoss
Optimizer: Adam (tuned parameters)
Early stopping based on validation loss
Model checkpointing (best model saved)

Training loop for the best model took around an hour to complete on my RTX3050. Early stopping was triggered at epoch 40, and the best epoch was epoch 30.

Hyperparameter Tuning

Used Optuna with SQLite storage
Tuned:
- Learning rate
- Weight decay
- Optimizer choice and related beta parameters
- Dropout
- Architecture-related parameters, such as number of Conv and dense layers, and number of filters/neurons for each.
- Usage of BatchNorm for dense layers (better results without BatchNorm for dense layer)
  
  Study consisted of 80 trials, and usage of MedianPruner() for pruning. The entire study optimisation consumed around 15-20 minutes of time.

Results

Final tuned CNN:

Test Accuracy: ~95.43%
Stable training and validation curves (figures available in reports/figures)
No significant overfitting observed

Key Insights

Saving checkpoints is incredibly useful, especially to train across multiple sessions
Tuning was very helpful in improving accuracy beyond 0.90
AdaptiveAvgPool2d simplified architecture and improved generalization
Data augmentation improved robustness across epochs
Saving Optuna study results is helpful for persistence and analysis

Project Structure

src/ # Core training pipeline
│
├── models/ # CNN architecture
├── data/ # Data loading and transforms
├── utils/ # Configs and helpers

notebooks/ # Experimentation and tuning
reports/ # Plots and Optuna outputs
configs/ # config.json
checkpoints/ # saved models and checkpoints
data/ # SVHN .mat files (not included)

How to Run

Download SVHN dataset:
- Use torchvision.datasets.SVHN with root='data'
  OR
- Download from http://ufldl.stanford.edu/housenumbers/ and place .mat files inside data/

2. Install dependencies:

pip install -r requirements.txt

Train model:

python -m src.models.train

Evaluate model:

python -m src.models.evaluate

Summarise model:

python -m src.models.summary

Notes

Dataset is not included due to size
Model is trained from scratch (no pretrained networks used)
Focus is on understanding and building the pipeline

Reflections

Was a pretty fun project ngl. Performance would obviously be better if pretrained models were used, but I just wanted to handcraft a CNN myself. Augmentations with albumentations library was fun too. Tuning felt exciting as well, cause you're curious on how your new parameters will perform. Sometimes it's a pleasant improvement, other times, it's a disappointment, but overall, still exciting. Was pretty fun seeing the changes as you tweaked the architecture in real time. Overall, fun little project

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
checkpoints		checkpoints
configs		configs
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handmade CNN for SVHN Digit Classification (PyTorch)

Modular Deep Learning project using Pytorch for classification of Street View House Numbers dataset. Built using custom CNN. Contains training scripts, notebooks, figures, reports and final model's parameters
Accuracy achieved: 0.9543

TLDR

Objective

Dataset

Approach

General Approach

Data Handling

Data Augmentation

Model

Training

Hyperparameter Tuning

Results

Key Insights

Project Structure

How to Run

Notes

Reflections

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Handmade CNN for SVHN Digit Classification (PyTorch)

Modular Deep Learning project using Pytorch for classification of Street View House Numbers dataset. Built using custom CNN. Contains training scripts, notebooks, figures, reports and final model's parametersAccuracy achieved: 0.9543

TLDR

Objective

Dataset

Approach

General Approach

Data Handling

Data Augmentation

Model

Training

Hyperparameter Tuning

Results

Key Insights

Project Structure

How to Run

Notes

Reflections

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Modular Deep Learning project using Pytorch for classification of Street View House Numbers dataset. Built using custom CNN. Contains training scripts, notebooks, figures, reports and final model's parameters
Accuracy achieved: 0.9543