Handwritten Character Recognition with EMNIST using Transfer Learning

This project presents a high-accuracy solution for handwritten letter and character recognition using the EMNIST dataset. It leverages a pre-trained Convolutional Neural Network (CNN), fine-tunes it with advanced augmentation and regularization techniques, and employs a systematic approach to overcome the challenges inherent in the dataset.

Key Features

High Accuracy: Achieves 88.61% on the imbalanced EMNIST ByClass split and 91.06% on the EMNIST Balanced split, performing close to or at state-of-the-art benchmarks.
Transfer Learning: Utilizes pre-trained EfficientNet-B2 and EfficientNet-B3 models, modified for grayscale input and the specific EMNIST class structure.
Advanced Augmentation: Employs MixUp and CutMix to effectively regularize the model, significantly reducing overfitting and improving generalization.
Class Imbalance Handling: Implements a custom-weighted KLDivLoss function to address the severe class imbalance in the ByClass dataset, ensuring fair training across all characters.
Systematic Optimization: Uses Weights & Biases for experiment tracking and systematically determines the best optimizer (Lion), learning rate scheduler (CosineAnnealingWarmRestarts), and hyperparameters.

Dataset: EMNIST

This project uses the EMNIST (Extended MNIST) dataset, which is a large collection of handwritten characters and digits.

Structure: The images are reformatted into a $28 \times 28$ grayscale format, similar to the original MNIST dataset.
Dataset Splits Used:
- ByClass: 62 classes, highly imbalanced. Contains 697,932 training images and 116,323 test images.
- Balanced: 47 balanced classes.
Challenges:
- Class Imbalance: In the ByClass split, the most frequent class appears over 17 times more often than the least frequent one (33,374 vs 1,896 samples).
- Data Quality: The dataset contains mislabeled and pre-augmented images (e.g., rotated by 90 degrees), which complicates classification.

Approach & Methodology

1. Model Architecture

The core of this project is a modified EfficientNet model. While several architectures were tested, EfficientNet-B2 and EfficientNet-B3 provided the best balance of performance and computational efficiency.

The pre-trained model was adapted for this task with two key modifications:

The first convolutional layer was changed to accept 1-channel grayscale images instead of the standard 3-channel RGB input.
The final classification layer was replaced with a new one tailored for the 62 classes of the EMNIST ByClass dataset.

2. Data Preprocessing & Augmentation

Image Resizing: Original $28 \times 28$ images were resized to $112 \times 112$. This resolution was experimentally determined to offer the best trade-off between feature extraction quality and computational load.
EMNIST Orientation fix: A custom Orientation fix transform was applied to fix EMNIST images being Rotated.
Normalization: The dataset's mean and standard deviation were recalculated after resizing and applied to all images.
Augmentation for Regularization: Overfitting was a significant challenge. While initial attempts included dropout and standard augmentations, the most effective strategy was a combination of MixUp and CutMix. This approach proved so effective that other augmentations were no longer necessary.

3. Training Strategy

Framework: The model was built and trained using PyTorch.
Optimizer: After experimenting with Adam, AdamW, and SGD, the Lion optimizer was found to deliver the best results.
Loss Function: To counter class imbalance in the ByClass split, a KLDivLoss function was used with pre-computed class weights. This ensures that the model does not become biased towards more frequent classes.
Learning Rate Scheduler: A warmup schedule was implemented using SequentialLR, which transitions to a CosineAnnealingWarmRestarts scheduler after 5 epochs. This stabilized initial training and helped convergence.

Results

The model achieved better performance then established benchmarks.

Dataset Split	Validation Accuracy	Benchmark	F1 Score
EMNIST ByClass	88.61%	88.43%	87.59%
EMNIST Balanced	91.06%	91.06%	90.98%

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
EMNIST_wandb_sweep.ipynb		EMNIST_wandb_sweep.ipynb
README.md		README.md
balanced.ipynb		balanced.ipynb
byclass.ipynb		byclass.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Handwritten Character Recognition with EMNIST using Transfer Learning

Key Features

Dataset: EMNIST

Approach & Methodology

1. Model Architecture

2. Data Preprocessing & Augmentation

3. Training Strategy

Results

View Benmark

View Wandb Report - ByClass

Weights for both byclass and balanced

About

Uh oh!

Releases

Packages

Languages

sandeepkumaraau/EMNIST-byclass-

Folders and files

Latest commit

History

Repository files navigation

Handwritten Character Recognition with EMNIST using Transfer Learning

Key Features

Dataset: EMNIST

Approach & Methodology

1. Model Architecture

2. Data Preprocessing & Augmentation

3. Training Strategy

Results

View Benmark

View Wandb Report - ByClass

Weights for both byclass and balanced

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages