The Ancient Tamil Script is one of the earliest known writing systems evidenced in many parts of India by Epigraphic records found on Rock edicts and Hero stones. Although these inscriptions are a rich source of history, very few applications have been developed to recognize and translate ancient Tamil Script characters. This is mainly due to the lack of proper datasets and very few experts on these long-lost ancient scripts. Furthermore, most stone inscriptions are in conditions that require Image enhancement and Noise removal after the capture of the image. The objective of this project is to develop an architecture to deal with unknown scripts in a systematic way and to curate a dataset necessary for future work. In addition, the aim of this project is to come up with an approach that overcomes the challenges of dealing with stone inscriptions to make it easier to translate them, using Convolutional Neural Networks and other Deep Learning techniques.
The project is notebook-based. Files must be executed in the correct order.
Input: Raw inscription image
Output: Recognized character labels
- Image preprocessing (cleaning + deskew)
- Character segmentation (extract individual characters)
- (Optional) Multipart character handling
- Dataset preparation
- CNN model training
- Character prediction
Run this first when you have a raw inscription image.
Original.jpg
(Place in the same directory)
- Deskews image
- Applies thresholding
- Cleans noise
ImagePreProcessingFinal.jpg
Original.jpg → ImagePreProcessingFinal.jpg
After preprocessing.
ImagePreProcessingFinal.jpg
- Detects contours
- Draws bounding boxes
- Extracts character regions
box.jpg
Images/roi0.png
Images/roi1.png
...
ImagePreProcessingFinal.jpg → Images/roi*.png
Ensure the folder Images/ exists before running.
Use only if you want to cluster unlabeled character images.
Modify paths inside notebook:
imdir = r'path_to/Images'
targetdir = r'path_to/output'- Uses VGG16 feature extraction
- Applies KMeans clustering
- Groups similar characters
Creates cluster folders inside output/
If characters are split into multiple pieces and need merging.
imdir = "path_to/input"
outdir = "path_to/output"This notebook loads:
CNN.model
Ensure the model exists before running.
Note: CNN.model must be created first by running Recognition_1.ipynb and Recognition_2.ipynb. This step cannot be executed before model training.
Also, the multipart detection logic uses a prediction threshold (e.g., prediction[0] > 0.5). If using a multi-class CNN model, this notebook may require modification.
These notebooks are inside Model-Creation/
When you have a labeled dataset.
Labelled Dataset/
class1/
img1.JPG
class2/
img2.JPG
...
DATADIR = r'path_to/Labelled Dataset'X.pickley.pickle
After Recognition_1.ipynb
- Builds CNN model
- Trains for 40 epochs
- Saves trained model
CNN.modelmodel.h5
For prediction on new images.
CNN.model- Update test image path inside notebook
Important: The class folder structure inside the Labelled Dataset must remain the same as during training. Changing folder order can result in incorrect label predictions.
- Loads trained model
- Predicts character class
- Prints predicted label
Recognition_1.ipynbRecognition_2.ipynbRecognition_3.ipynb
Ensure CNN.model already exists (model must be trained first).
1 Image_Preprocessing.ipynb2 character_segmentation.ipynb- (Optional)
Multipart_Concatenation.ipynb - Use trained model to predict each
roi*.png
-
Update all placeholder paths like:
path_to\... pathto\... -
Maintain consistent file names:
Original.jpg ImagePreProcessingFinal.jpg -
Ensure required folders exist before writing images.
- Python 3.x
- TensorFlow / Keras
- OpenCV
- NumPy
- scikit-learn
- glob
- matplotlib
pip install tensorflow opencv-python numpy scikit-learn matplotlibRaw Image
↓
Preprocessing
↓
Segmentation
↓
(Optional Cleanup)
↓
Dataset Creation
↓
CNN Training
↓
Prediction
Each notebook must be executed in the correct order for successful implementation.