Ancient-Tamil-Script-Recognition

The Ancient Tamil Script is one of the earliest known writing systems evidenced in many parts of India by Epigraphic records found on Rock edicts and Hero stones. Although these inscriptions are a rich source of history, very few applications have been developed to recognize and translate ancient Tamil Script characters. This is mainly due to the lack of proper datasets and very few experts on these long-lost ancient scripts. Furthermore, most stone inscriptions are in conditions that require Image enhancement and Noise removal after the capture of the image. The objective of this project is to develop an architecture to deal with unknown scripts in a systematic way and to curate a dataset necessary for future work. In addition, the aim of this project is to come up with an approach that overcomes the challenges of dealing with stone inscriptions to make it easier to translate them, using Convolutional Neural Networks and other Deep Learning techniques.

Complete Implementation Workflow

The project is notebook-based. Files must be executed in the correct order.

End-to-End Pipeline Overview

Input: Raw inscription image
Output: Recognized character labels

Workflow

Image preprocessing (cleaning + deskew)
Character segmentation (extract individual characters)
(Optional) Multipart character handling
Dataset preparation
CNN model training
Character prediction

File Execution Order & Explanation

1 `1 Image_Preprocessing.ipynb`

When to Run

Run this first when you have a raw inscription image.

Input Required

Original.jpg

(Place in the same directory)

What It Does

Deskews image
Applies thresholding
Cleans noise

Output Generated

ImagePreProcessingFinal.jpg

Pipeline So Far

Original.jpg → ImagePreProcessingFinal.jpg

2 `2 character_segmentation.ipynb`

When to Run

After preprocessing.

Input Required

ImagePreProcessingFinal.jpg

What It Does

Detects contours
Draws bounding boxes
Extracts character regions

Output Generated

box.jpg
Images/roi0.png
Images/roi1.png
...

Pipeline Now

ImagePreProcessingFinal.jpg → Images/roi*.png

Ensure the folder Images/ exists before running.

3 `Image_Clustering.ipynb` (Optional)

When to Run

Use only if you want to cluster unlabeled character images.

Required Setup

Modify paths inside notebook:

imdir = r'path_to/Images'
targetdir = r'path_to/output'

What It Does

Uses VGG16 feature extraction
Applies KMeans clustering
Groups similar characters

Output

Creates cluster folders inside output/

4 `Multi-Part/Multipart_Concatenation.ipynb` (Optional)

When to Run

If characters are split into multiple pieces and need merging.

Required Setup

imdir = "path_to/input"
outdir = "path_to/output"

Important

This notebook loads:

CNN.model

Ensure the model exists before running.

Note: CNN.model must be created first by running Recognition_1.ipynb and Recognition_2.ipynb. This step cannot be executed before model training.

Also, the multipart detection logic uses a prediction threshold (e.g., prediction[0] > 0.5). If using a multi-class CNN model, this notebook may require modification.

Model Training Workflow

These notebooks are inside Model-Creation/

5 `Recognition_1.ipynb`

When to Run

When you have a labeled dataset.

Dataset Format Required

Labelled Dataset/
  class1/
    img1.JPG
  class2/
    img2.JPG
  ...

Modify Path

DATADIR = r'path_to/Labelled Dataset'

Output Generated

X.pickle
y.pickle

6 `Recognition_2.ipynb`

When to Run

After Recognition_1.ipynb

What It Does

Builds CNN model
Trains for 40 epochs
Saves trained model

Output Generated

CNN.model
model.h5

7 `Recognition_3.ipynb`

When to Run

For prediction on new images.

Requirements

CNN.model
Update test image path inside notebook

Important: The class folder structure inside the Labelled Dataset must remain the same as during training. Changing folder order can result in incorrect label predictions.

What It Does

Loads trained model
Predicts character class
Prints predicted label

Practical Run Order

If Training Model from Scratch

Recognition_1.ipynb
Recognition_2.ipynb
Recognition_3.ipynb

If Recognizing from Raw Inscription Image

Ensure CNN.model already exists (model must be trained first).

1 Image_Preprocessing.ipynb
2 character_segmentation.ipynb
(Optional) Multipart_Concatenation.ipynb
Use trained model to predict each roi*.png

Important Notes

Update all placeholder paths like:
```
path_to\...
pathto\...
```

Maintain consistent file names:

Original.jpg
ImagePreProcessingFinal.jpg

Ensure required folders exist before writing images.

Requirements

Python 3.x
TensorFlow / Keras
OpenCV
NumPy
scikit-learn
glob
matplotlib

Install Dependencies

pip install tensorflow opencv-python numpy scikit-learn matplotlib

Summary

Raw Image
   ↓
Preprocessing
   ↓
Segmentation
   ↓
(Optional Cleanup)
   ↓
Dataset Creation
   ↓
CNN Training
   ↓
Prediction

Each notebook must be executed in the correct order for successful implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Input Images		Input Images
Labels		Labels
Model-Creation		Model-Creation
Multi-Part		Multi-Part
1 Image_Preprocessing.ipynb		1 Image_Preprocessing.ipynb
2 character_segmentation.ipynb		2 character_segmentation.ipynb
ImagePreProcessingFinal.jpg		ImagePreProcessingFinal.jpg
Image_Clustering.ipynb		Image_Clustering.ipynb
Original.jpg		Original.jpg
README.md		README.md
box.jpg		box.jpg
rotated.jpg		rotated.jpg

Folders and files

Latest commit

History

Repository files navigation

Ancient-Tamil-Script-Recognition

Complete Implementation Workflow

End-to-End Pipeline Overview

Workflow

File Execution Order & Explanation

1 1 Image_Preprocessing.ipynb

When to Run

Input Required

What It Does

Output Generated

Pipeline So Far

2 2 character_segmentation.ipynb

When to Run

Input Required

What It Does

Output Generated

Pipeline Now

3 Image_Clustering.ipynb (Optional)

When to Run

Required Setup

What It Does

Output

4 Multi-Part/Multipart_Concatenation.ipynb (Optional)

When to Run

Required Setup

Important

Model Training Workflow

5 Recognition_1.ipynb

When to Run

Dataset Format Required

Modify Path

Output Generated

6 Recognition_2.ipynb

When to Run

What It Does

Output Generated

7 Recognition_3.ipynb

When to Run

Requirements

What It Does

Practical Run Order

If Training Model from Scratch

If Recognizing from Raw Inscription Image

Important Notes

Requirements

Install Dependencies

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

1 `1 Image_Preprocessing.ipynb`

2 `2 character_segmentation.ipynb`

3 `Image_Clustering.ipynb` (Optional)

4 `Multi-Part/Multipart_Concatenation.ipynb` (Optional)

5 `Recognition_1.ipynb`

6 `Recognition_2.ipynb`

7 `Recognition_3.ipynb`

Packages