Skip to content

mohaneddz/FORSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Flask-Forsa RAG System

A high-performance, production-ready Retrieval-Augmented Generation (RAG) system built with FastAPI, featuring hybrid search, semantic reranking, and bilingual support (French/Arabic).

Built with FastAPI + Python + DeepSeek/Gemini + FAISS + BGE-M3.


πŸš€ Tech Stack

Backend Framework

FastAPI Python Uvicorn Pydantic

AI & ML Stack

DeepSeek Gemini Transformers PyTorch

Vector & Search

FAISS BGE-M3 BGE Reranker

Data Processing

PyMuPDF python-docx

UI/Interface

Gradio


⚑ Overview

Flask-Forsa is a production-grade RAG system designed for high-accuracy question answering with strict source attribution.

Core Capabilities

  • Hybrid Retrieval β€” Semantic + Lexical search fusion with MinMax normalization
  • Smart Reranking β€” BGE-reranker-v2-m3 cross-encoder for precision
  • Bilingual Support β€” Native French & Arabic processing
  • Parallel Processing β€” Handle 50+ questions concurrently via ThreadPoolExecutor
  • JSON Mode β€” Structured, reliable LLM outputs with validation
  • Answer Validation β€” Grounding checks & hallucination detection
  • Source Attribution β€” Evidence-based answers with score-based citations
  • Document Processing β€” PDF, DOCX, TXT support with Gemini semantic chunking

Everything runs locally with full control over your data pipeline.


πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI REST API Endpoint                   β”‚
β”‚      (Structured Chat + asyncio.gather Parallelism)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚ Query Service  β”‚              β”‚ RAG Service β”‚
β”‚ (Enhancement)  β”‚              β”‚  (Hybrid)   β”‚
β”‚ β€’ Normalize    β”‚              β”‚ β€’ Semantic  β”‚
β”‚ β€’ Extract      β”‚              β”‚ β€’ Lexical   β”‚
β”‚ β€’ Variations   β”‚              β”‚ β€’ Merge     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
        β”‚                               β”‚
        β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
        β”‚         β”‚                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Embedding Service  β”‚      β”‚  Vector Store      β”‚
β”‚    (BGE-M3 1024d)   β”‚      β”‚  (FAISS Index)     β”‚
β”‚ β€’ GPU Accelerated   β”‚      β”‚ β€’ Flat Index       β”‚
β”‚ β€’ Batch Processing  β”‚      β”‚ β€’ Cosine Sim       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                             β”‚ Reranker Service   β”‚
                             β”‚ (BGE Reranker V2)  β”‚
                             β”‚ β€’ Cross-Encoder    β”‚
                             β”‚ β€’ Score Override   β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                             β”‚  Validation Utils  β”‚
                             β”‚ β€’ Scope Filter     β”‚
                             β”‚ β€’ Grounding Check  β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                             β”‚   LLM Service      β”‚
                             β”‚ β€’ DeepSeek V3.1    β”‚
                             β”‚ β€’ JSON Mode        β”‚
                             β”‚ β€’ Retry Logic      β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🧠 Core Algorithms

Feature Algorithm Description
Hybrid Search Weighted Semantic + Lexical MinMax normalization with 0.65/0.35 weights
Reranking BGE Cross-Encoder Scoring Neural reranking with score replacement
Query Enhancement Multi-pass Normalization Key term extraction + variations
Scope Filtering Entity + Family Detection Pre-filters chunks by question context
Answer Validation Grounding + Citation Analysis Checks answer support in sources
Deduplication Hash-based Chunk Keys SHA1 content hashing prevents duplicates
Document Chunking Gemini Semantic Splitting Context-aware chunk boundaries
Parallel Execution asyncio.gather + ThreadPool 50 concurrent workers for I/O operations

πŸ’½ Installation Guide

1. Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended for embeddings)

2. Clone & Setup

# Clone repository
git clone <repository-url>
cd flask-forsa

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Linux/Mac)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure Environment

Create .env.local:

# LLM API Keys
DEEPSEEK_TOKEN=your_deepseek_api_token
DEEPSEEK_API_URL=https://api.modelarts-maas.com/v2/chat/completions
GEMINI_API_KEY=your_gemini_api_key

4. Prepare Your Data

# Add documents to data/ folder
mkdir -p data
# Copy your PDFs, DOCX files here

# Run preprocessing with Gemini
python preprocess.py

5. Build Vector Index

# Index your preprocessed chunks
python build_index.py

6. Launch Server

# Start FastAPI server
python main.py

# Server runs at: http://localhost:8000
# API Docs: http://localhost:8000/docs

7. (Optional) Launch Gradio UI

# In a new terminal
python gradio_app.py

# UI runs at: http://localhost:7860

πŸ“š API Usage

Structured Chat Endpoint

Endpoint: POST /api/chat/

Request Schema

{
  "equipe": "GhostTruth",
  "question": {
    "category_1": {
      "q1": "Quel est le dossier administratif requis pour Idoom?",
      "q2": "Quels sont les avantages de l'offre Idoom Fibre?"
    },
    "category_2": {
      "q1": "Comment souscrire Γ  Idoom 4G LTE?"
    }
  },
  "include_sources": false
}

Response Schema

{
  "equipe": "GhostTruth",
  "reponses": {
    "category_1": {
      "q1": "Le dossier administratif comprend...",
      "q2": "Les avantages de l'offre Idoom Fibre incluent..."
    },
    "category_2": {
      "q1": "Pour souscrire Γ  Idoom 4G LTE, vous devez..."
    }
  },
  "sources": null
}

With Source Attribution

Set "include_sources": true:

{
  "sources": {
    "category_1": {
      "q1": [
        {
          "file": "guide_idoom_2024.pdf",
          "score": 0.8543,
          "evidence": "Le dossier administratif requis comprend une copie de la CIN..."
        },
        {
          "file": "procedures_algerie_telecom.pdf",
          "score": 0.7821,
          "evidence": "Documents nΓ©cessaires : justificatif de domicile, CIN..."
        }
      ]
    }
  }
}

βš™οΈ Configuration Reference

Edit config.py to customize behavior:

Retrieval Settings

TOP_K_SEMANTIC = 20          # Semantic search candidates
TOP_K_LEXICAL = 20           # Lexical search candidates
HYBRID_W_SEM = 0.65          # Semantic weight (0-1)
HYBRID_W_LEX = 0.35          # Lexical weight (0-1)
SIMILARITY_THRESHOLD = 0     # Minimum hybrid score (0 = disabled)

Reranking

USE_RERANKER = False                           # Enable/disable reranking
RERANKER_MODEL = "BAAI/bge-reranker-v2-m3"    # Cross-encoder model

Answer Processing

ENABLE_ANSWER_FIXER = False      # Strict RAG post-processing (not used)
ANSWER_FIXER_TEMPERATURE = 0.1   # LLM temperature for fixes

Vector Store

EMBEDDING_MODEL_NAME = "BAAI/bge-m3"  # Embedding model
EMBEDDING_DIMENSION = 1024             # Embedding dimensions
INDEX_TYPE = "Flat"                    # FAISS index type
FAISS_INDEX_PATH = "db/faiss_index.bin"

Source Limits

MIN_SOURCES = 1   # Minimum sources to return
MAX_SOURCES = 6   # Maximum sources to return

LLM Settings

MODEL_NAME = "deepseek-v3.1"          # DeepSeek model
GEMINI_MODEL = "gemini-2.5-flash"     # Gemini model for chunking
ENABLE_LLM_WARMUP = False             # Warmup on startup

πŸ“Š Performance Metrics

Metric Performance
Parallel Questions Up to 50 concurrent
Avg Latency (uncached) ~45-65s per question
Retrieval Speed ~50s (hybrid + rerank)
Reranking Speed ~40-60s (BGE reranker)
LLM Generation 1-3s (DeepSeek)
Embedding Batch (GPU) ~100 docs/s
Index Search (FAISS) <1s

Note: Current configuration has reranking disabled (USE_RERANKER = False) and threshold at 0.


πŸ”§ Advanced Features

Query Enhancement

Automatically processes queries via query_service:

  • Normalization: Lowercase, diacritics removal
  • Key Terms: Extracts important words for lexical search
  • Variations: Generates alternative phrasings
  • Type Detection: Identifies question category

Scope-Based Filtering

Intelligently filters chunks via validation_utils:

  • Entity Detection: Companies, products, services
  • Document Families: Categorizes by topic
  • Question Type: Matches chunk relevance

Answer Validation

Multi-layer validation in validation_utils:

  • Grounding Check: Verifies answer in sources
  • Hallucination Detection: Flags unsupported claims
  • Availability Check: Detects "no info" responses
  • Citation Accuracy: Validates source references

Document Processing

Advanced chunking with Gemini in preprocess.py:

  • Semantic Boundaries: Context-aware splits
  • Metadata Extraction: File, page, section info
  • Multi-format: PDF, DOCX, TXT support via document_processor

πŸ“ Project Structure

flask-forsa/
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ chat.py              # Main structured chat endpoint
β”‚   β”œβ”€β”€ embeddings.py        # Embedding utilities
β”‚   └── cache.py             # Cache management endpoints
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ rag_service.py       # Hybrid retrieval engine
β”‚   β”œβ”€β”€ llm_service.py       # LLM integration (DeepSeek)
β”‚   β”œβ”€β”€ embedding_service.py # BGE-M3 embeddings
β”‚   β”œβ”€β”€ vector_store.py      # FAISS vector operations
β”‚   β”œβ”€β”€ reranker_service.py  # BGE reranker (optional)
β”‚   β”œβ”€β”€ query_service.py     # Query enhancement
β”‚   └── document_processor.py # PDF/DOCX extraction
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ validation_utils.py  # Answer validation & scope filtering
β”‚   β”œβ”€β”€ answer_fixer.py      # Post-processing (not used)
β”‚   └── logging_utils.py     # Performance tracking decorator
β”œβ”€β”€ data/                    # Document storage
β”œβ”€β”€ db/                      # FAISS index + metadata
β”œβ”€β”€ schemas.py               # Pydantic models
β”œβ”€β”€ config.py                # Configuration
β”œβ”€β”€ prompts.py               # System prompts
β”œβ”€β”€ preprocess.py            # Gemini-based document chunking
β”œβ”€β”€ build_index.py           # FAISS index builder
β”œβ”€β”€ main.py                  # FastAPI app
β”œβ”€β”€ gradio_app.py            # Gradio UI
β”œβ”€β”€ test_preprocess.py       # Gemini test suite
└── README.md                # This file

πŸ§ͺ Testing

Test Gemini Setup

python test_preprocess.py

Test API with cURL

curl -X POST http://localhost:8000/api/chat/ \
  -H "Content-Type: application/json" \
  -d '{
    "equipe": "TestTeam",
    "question": {
      "test": {
        "q1": "Comment souscrire Γ  Idoom?"
      }
    },
    "include_sources": true
  }'

Run with Gradio UI

# Terminal 1: Start API
python main.py

# Terminal 2: Start Gradio
python gradio_app.py

Visit http://localhost:7860 for the web interface.


πŸ“ˆ Monitoring & Debugging

Enable Debug Logs

In config.py:

DEBUG_LOG_RETRIEVAL = True

Log Output Includes

  • Retrieval Transparency: Hybrid scores, semantic/lexical breakdown
  • Query Enhancement: Normalization steps, key terms, variations
  • Validation Results: Scope filtering, grounding checks
  • Performance Metrics: Timing for each pipeline stage via @log_execution_time
  • Token Usage: DeepSeek input/output tokens

Current Logging

Logs are configured via logger_config.py and output to console with INFO level.


🀝 Contributing

This is a private project. Unauthorized use, reproduction, or distribution is prohibited.

For authorized contributors:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

Β© 2025. All rights reserved.
Unauthorized use, reproduction, modification, or distribution
of this project or its source code is strictly prohibited.

⚠️ IMPORTANT: This project is private and fully restricted.


πŸ™ Acknowledgments

AI & ML Models:

Infrastructure:

  • FAISS β€” Facebook AI vector similarity search
  • FastAPI β€” Modern async Python web framework
  • Gradio β€” ML web interfaces
  • PyMuPDF β€” PDF processing and text extraction
  • Transformers β€” HuggingFace ML model library

Built with ❀️ for production-grade RAG applications

Powered by hybrid search, semantic reranking, and bilingual intelligence.

About

π…πŽπ‘π’π€ is a Python-based software project focused on practical workflow automation and product experimentation with a clean, extensible code structure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages