Modular Deep Learning project using Pytorch for classification of Street View House Numbers dataset. Built using custom CNN. Contains training scripts, notebooks, figures, reports and final model's parameters
Accuracy achieved: 0.9543
Chose to build a custom CNN model instead of tuning pretrained ones. Chose the SVHN dataset to work on for the project. No rigorous EDA was necessary, since dataset contained only images. Used data augmentation during training. Tuned hyperparameters a lot using Optuna. Spent a few hours training tuned model. Then saved model and checkpoints, wrote training scripts and organised remaining project structure.
- Build a CNN from scratch for multi-class image classification
- Face, understand and debug common deep learning pipeline issues
- Implement a stable and correct training + validation pipeline
- Perform extensive tuning of hyperparameters using Optuna
- Improve model performance through architecture and data pipeline refinement
SVHN (Street View House Numbers)
- Real-world digit classification dataset (0–9)
- Images of size 32×32 (resized to 28×28 during preprocessing)
Dataset source: http://ufldl.stanford.edu/housenumbers/
Note: Dataset is not included in the repository.
If using torchvision.datasets.SVHN, then provide root='data' when creating instance of SHVN.
If using provided link, place .mat files inside the data/ directory.
- Build a baseline CNN and ensure correct training behavior
- Debug issues related to shapes, channels, and loss computation
- Implement proper train/validation split and data augmentation for train split
- Tune hyperparameters using Optuna
- Implement better architecture (BatchNorm, Dropout, Adaptive Pooling) and optimizers obtained through tuning
- Final training with early stopping and checkpointing
- Used train split for training + validation
- Used extra split to increase training data
- Validation set created from train split (no augmentation)
- Proper transform separation:
- Train: augmentation + normalization
- Validation/Test: normalization only
Used albumentations library for augmentation. Used augmentations include:
- Random cropping with padding
- Color jitter (brightness, contrast, saturation)
- Very slight blur and rotations
- No heavy rotations to preserve digit structure
Custom CNN with:
- Conv → ReLU → MaxPool → BatchNorm blocks
- Progressive channel increase
- MaxPooling for spatial reduction
- AdaptiveAvgPool2d to remove dependency on input size
- Fully connected classifier with dropout
- Loss: CrossEntropyLoss
- Optimizer: Adam (tuned parameters)
- Early stopping based on validation loss
- Model checkpointing (best model saved)
Training loop for the best model took around an hour to complete on my RTX3050. Early stopping was triggered at epoch 40, and the best epoch was epoch 30.
- Used Optuna with SQLite storage
- Tuned:
- Learning rate
- Weight decay
- Optimizer choice and related beta parameters
- Dropout
- Architecture-related parameters, such as number of Conv and dense layers, and number of filters/neurons for each.
- Usage of BatchNorm for dense layers (better results without BatchNorm for dense layer)
Study consisted of 80 trials, and usage of MedianPruner() for pruning. The entire study optimisation consumed around 15-20 minutes of time.
Final tuned CNN:
- Test Accuracy: ~95.43%
- Stable training and validation curves (figures available in reports/figures)
- No significant overfitting observed
- Saving checkpoints is incredibly useful, especially to train across multiple sessions
- Tuning was very helpful in improving accuracy beyond 0.90
- AdaptiveAvgPool2d simplified architecture and improved generalization
- Data augmentation improved robustness across epochs
- Saving Optuna study results is helpful for persistence and analysis
src/ # Core training pipeline
│
├── models/ # CNN architecture
├── data/ # Data loading and transforms
├── utils/ # Configs and helpers
notebooks/ # Experimentation and tuning
reports/ # Plots and Optuna outputs
configs/ # config.json
checkpoints/ # saved models and checkpoints
data/ # SVHN .mat files (not included)
- Download SVHN dataset:
- Use torchvision.datasets.SVHN with root='data'
OR - Download from http://ufldl.stanford.edu/housenumbers/ and place
.matfiles insidedata/
- Use torchvision.datasets.SVHN with root='data'
2. Install dependencies:
pip install -r requirements.txt
- Train model:
python -m src.models.train
- Evaluate model:
python -m src.models.evaluate
- Summarise model:
python -m src.models.summary
- Dataset is not included due to size
- Model is trained from scratch (no pretrained networks used)
- Focus is on understanding and building the pipeline
Was a pretty fun project ngl. Performance would obviously be better if pretrained models were used, but I just wanted to handcraft a CNN myself. Augmentations with albumentations library was fun too. Tuning felt exciting as well, cause you're curious on how your new parameters will perform. Sometimes it's a pleasant improvement, other times, it's a disappointment, but overall, still exciting. Was pretty fun seeing the changes as you tweaked the architecture in real time. Overall, fun little project