Nepali Speech to Text Translation (ASR) System

Introduction

This project aims to build an Automatic Speech Recognition (ASR) system for the Nepali language. Using OpenAI's Whisper Small model as the base, we fine-tuned it on a custom dataset to accurately transcribe Nepali speech into text.

Project Architecture

Data Preparation: Scripts for cleaning, preprocessing, and augmenting Nepali speech data.
Model Training: Configuration and scripts for fine-tuning the Whisper model.
Inference and Evaluation: Tools and demo interfaces to run the model on new audio samples.
Frontend and Deployment: A Streamlit application for interactive user testing.

Status

Known Issue

Some audio files contain background noise that affects transcription quality.
Limited data . More data can be used to generalize in different scenarios
Multiple channels or multiple people talking not transcribed well

High Level Next Steps

Collect more diverse and high-quality Nepali speech data.
Train larger models if gpu resources available

Usage

Installation

Use git clone.

Install the requirements with

pip install -r requirements.in

Usage Instructions

GUI Inference

clone the repository

git clone https://github.com/fuseai-fellowship/Nepali-Speech-to-Text-Translation.git

change to inference directory

cd src/inference

run

streamlit run app.py

CLI Inference

python src/inference.py test.mp3

Data Source

Refere to the dataset readme for details on the dataset, sources, usablility and the link to data.

Code Structure

## Updated Code Structure

├── assets
├── dataset
│   ├── male-female-data (SLR143)
│   ├── ne_np_female (SLR43)
│   ├── preperation_scripts
│   ├── scraping
│   ├── synthetic_data_using_TTS
│   └── README.md
├── docs
├── notebook
│   ├── finetuning-whispher-on-Nepali-base_old_data.ipynb
│   ├── finetuning-whispher-on-Nepali-small_old_data.ipynb
│   ├── notebook_inference_and_push_hub.ipynb
│   ├── whisper_fine_tune_5_epoch.ipynb
│   └── whispher-finetune-on-small_NP_ASR_data.ipynb
├── src
│   ├── inference
│   ├── inference.py
│   ├── test.mp3
│   ├── train.py
│   └── utils.py
├── tests
│   └── test_template.py
├── Dockerfile
├── Makefile
├── pyproject.toml
├── README.md
├── requirements.in
├── requirements.txt

datasets: Data preparation scripts. src: Model training and architecture src/utils: utility functions for processing audio and model output src/inference: Inference scripts and the Streamlit demo. requirements.in: List of Python dependencies.

Makefile: Commands to set up and manage the project.

Artifacts Location

HuggingFace Demo: https://huggingface.co/spaces/kshitizzzzzzz/NEPALI_ASR_Whisper_Small
Model source code: https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py

Results

Metrics Used

We used the Word Error Rate (WER) to evaluate the accuracy of the ASR system. WER is calculated as follows:

$$ \text{WER} = \frac{\text{Substitutions} + \text{Insertions} + \text{Deletions}}{\text{Total Words}} $$

A lower WER indicates a better-performing model.

Evaluation Results

The current model has WER of 32 on common voice and other collected validation set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nepali Speech to Text Translation (ASR) System

Introduction

Project Architecture

Status

Known Issue

High Level Next Steps

Usage

Installation

Usage Instructions

GUI Inference

CLI Inference

Data Source

Code Structure

Artifacts Location

Results

Metrics Used

Evaluation Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
dataset		dataset
docs		docs
notebook		notebook
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt

Zaxis018/Nepali_speech_to_text

Folders and files

Latest commit

History

Repository files navigation

Nepali Speech to Text Translation (ASR) System

Introduction

Project Architecture

Status

Known Issue

High Level Next Steps

Usage

Installation

Usage Instructions

GUI Inference

CLI Inference

Data Source

Code Structure

Artifacts Location

Results

Metrics Used

Evaluation Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages