A simple Retrieval-Augmented Generation (RAG) system that supports both English and Bengali queries, designed to fetch relevant information from PDF documents and generate meaningful answers.
Develop a basic RAG pipeline capable of understanding and responding to both English and Bengali queries by retrieving relevant information from a PDF document corpus and generating grounded answers.
- Multilingual Support: Accepts user queries in both English and Bengali
- Document Retrieval: Fetches relevant document chunks from knowledge base
- Contextual Answers: Generates responses based on retrieved information
- Memory Management: Maintains both short-term and long-term memory
- Source: HSC26 Bangla 1st Paper (Bengali textbook)
- Processing: Advanced pre-processing and data cleaning for improved chunk accuracy
- Storage: Document chunking and vectorization in vector database
- Short-Term Memory: Recent chat sequence inputs
- Long-Term Memory: PDF document corpus stored in vector database
POST /chat
Request Body:
{
"query": "অনুপমের ভাষায় সত্যপুরুষ কাকে বলা হয়েছে?"
}Response:
{
"answer": "শুভনাথ",
"sources": ["chunk_id_1", "chunk_id_2"],
"confidence": 0.85
}| Query (Bengali) | Expected Answer |
|---|---|
| অনুপমের ভাষায় সত্যপুরুষ কাকে বলা হয়েছে? | শুভনাথ |
| কাকে অনুপমের ভাগ্য দেবতা বলে উল্লেখ করা হয়েছে? | মামাকে |
| বিয়ের সময় কল্যাণীর প্রকৃত বয়স কত ছিল? | ১৫ বছর |
# Clone the repository
git clone <repository-url>
cd multilingual-rag-system
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your configurationpython main.py --query "অনুপমের ভাষায় সত্যপুরুষ কাকে বলা হয়েছে?"# Start the server
python app.py
# Make requests
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"query": "কাকে অনুপমের ভাগ্য দেবতা বলে উল্লেখ করা হয়েছে?"}'