A sophisticated Retrieval-Augmented Generation (RAG) application that answers questions based on PDF documents using FastAPI, ChromaDB, and Google Gemini 2.0 Flash. The system automatically processes and indexes PDF documents to provide accurate, context-aware answers.
- Features
- System Architecture
- Project Structure
- Prerequisites
- Quick Start Guide
- Detailed Setup
- Usage Guide
- API Documentation
- Configuration
- Troubleshooting
- Automated PDF Processing: Pre-loads and processes PDF documents on startup
- Intelligent Text Chunking: Splits documents with configurable overlap for better context
- Advanced Tokenization: Uses tiktoken for accurate token counting and processing
- Semantic Search: Leverages Sentence Transformers for high-quality embeddings
- Persistent Storage: ChromaDB vector database with automatic persistence
- AI-Powered Answers: Google Gemini 2.0 Flash for intelligent response generation
- RESTful API: FastAPI-based API with automatic documentation
- Web Interface: Beautiful HTML frontend for easy testing
- Real-time Status: Health monitoring and system status checks
- Asynchronous Processing: FastAPI async endpoints for better performance
- Error Handling: Comprehensive error handling and logging
- CORS Support: Cross-origin resource sharing for web integration
- Configurable Parameters: Customizable chunk sizes, overlap, and model settings
- Database Management: Reset, reload, and manage document collections
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PDF Document │ -> │ Text Extraction│ -> │ Text Chunking │
│ (IN 1501.pdf) │ │ (PyPDF2) │ │ (Smart Overlap) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
v v v
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Tokenization │ -> │ Vector Embeddings│ -> │ ChromaDB Store │
│ (tiktoken) │ │(SentenceTransf.) │ │ (Persistent) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
v v v
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Question │ -> │ Semantic Search │ -> │ Context Retrieval│
│ (Frontend) │ │ (Similarity) │ │ (Top-K Results) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
v v v
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Answer Gen. │ <- │ Gemini 2.0 Flash│ <- │ Context + Query │
│ (Response) │ │ (LLM) │ │ (Prompt Eng.) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
rag-pdf-qa/
├── 📄 main.py # FastAPI application & endpoints
├── ⚙️ config.py # Configuration settings
├── 📋 pdf_processor.py # PDF extraction and chunking logic
├── 🧮 embeddings.py # Tokenization & embedding generation
├── 🗄️ vector_store.py # ChromaDB integration & management
├── 🤖 llm_service.py # Gemini LLM integration
├── 🌐 frontend.html # Web interface for testing
├── 🔧 setup_pdf.py # PDF preprocessing utility
├── 📦 requirements.txt # Python dependencies
├── 🔐 .env # Environment variables (API keys)
├── 📖 README.md # This documentation
├── 🧪 test_main.http # HTTP test requests
├── 📁 data/ # PDF documents folder
│ └── IN 1501 - Signals.pdf
├── 🗃️ chroma_db/ # Vector database (auto-created)
└── 🐍 __pycache__/ # Python cache files
- Python: 3.8 or higher
- RAM: Minimum 4GB (8GB+ recommended for larger documents)
- Storage: ~500MB for dependencies + document storage
- OS: Windows, macOS, or Linux
- Google AI Studio Account: For Gemini API access
- Internet Connection: For downloading models and API calls
# Navigate to your project directory
cd "d:\Uom IT\session"pip install -r requirements.txt# Copy the example environment file
copy .env.example .env
# Edit .env file and add your Google API key:
# GOOGLE_API_KEY=your_actual_api_key_herepython main.py
# OR
uvicorn main:app --reload --host 0.0.0.0 --port 8000- API: http://localhost:8000
- Web Interface: Open
frontend.htmlin your browser - API Docs: http://localhost:8000/docs
-
Install Python 3.8+
python --version # Check your version
-
Create Virtual Environment (Recommended)
python -m venv .venv .venv\Scripts\activate # On Windows # source .venv/bin/activate # On macOS/Linux
pip install -r requirements.txtKey Dependencies:
fastapi>=0.100.0- Web frameworkuvicorn>=0.20.0- ASGI serverPyPDF2>=3.0.0- PDF processingchromadb>=1.3.0- Vector databasegoogle-generativeai>=0.8.0- Gemini integrationsentence-transformers>=2.2.0- Embeddingstiktoken>=0.5.0- Tokenization
-
Visit Google AI Studio
-
Create API Key
- Click "Create API Key"
- Choose your project or create a new one
- Copy the generated API key
-
Configure Environment
# Copy example file copy .env.example .env # Edit .env file with your API key GOOGLE_API_KEY=your_actual_api_key_here
-
Add PDF Document
- Place your PDF in the
data/folder - Current document:
IN 1501 - Signals.pdf
- Place your PDF in the
-
Verify PDF Location
dir data\ # Should show your PDF file
Method 1: Direct Python
python main.pyMethod 2: Using Uvicorn
uvicorn main:app --reload --host 0.0.0.0 --port 8000Method 3: Background Process
uvicorn main:app --host 0.0.0.0 --port 8000 &-
Check Server Status
- Visit: http://localhost:8000/health
- Should show:
{"status": "healthy", "documents_count": X}
-
Access API Documentation
- Visit: http://localhost:8000/docs
- Interactive Swagger UI for testing
-
Open Web Interface
- Open
frontend.htmlin your browser - Should show green status indicator
- Open
-
Open Frontend
- Double-click
frontend.htmlor open in browser - Verify green status indicator (server must be running)
- Double-click
-
Ask Questions
- Use sample questions or type your own
- Adjust number of sources (1-10)
- Click "🔍 Ask Question"
- View answer and source documents
-
Sample Questions for Testing
- "What are the main types of signals discussed in the document?"
- "Explain the difference between analog and digital signals."
- "What is signal processing and why is it important?"
- "How are signals used in communication systems?"
# Ask a question
$body = @{
question = "What are digital signals?"
top_k = 5
} | ConvertTo-Json
$response = Invoke-RestMethod -Uri "http://localhost:8000/ask" -Method POST -Body $body -ContentType "application/json"
Write-Output $response.answer# Health check
curl -X GET "http://localhost:8000/health"
# Ask question
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What is signal processing?", "top_k": 3}'
# Reset database
curl -X POST "http://localhost:8000/reset"GET /Description: Returns API information and available endpoints
Response: JSON with endpoint descriptions
POST /ask
Content-Type: application/jsonDescription: Ask a question based on the loaded PDF document
Request Body:
{
"question": "What are the main types of signals?",
"top_k": 5 // Optional: number of sources (1-10)
}Response:
{
"question": "What are the main types of signals?",
"answer": "Based on the document, the main types of signals are...",
"sources": [
"Text chunk 1 from PDF...",
"Text chunk 2 from PDF...",
"Text chunk 3 from PDF..."
],
"source_count": 3
}GET /healthDescription: Check API health and database status
Response:
{
"status": "healthy",
"documents_count": 42,
"embedding_dimension": 384
}POST /resetDescription: Clear database and reload PDF
Response:
{
"message": "Database reset and PDF reloaded successfully",
"documents_count": 42
}POST /reloadDescription: Reload PDF documents into database
Response:
{
"message": "PDF reloaded successfully",
"documents_count": 42
}- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
GOOGLE_API_KEY=your_gemini_api_key_hereclass Settings:
GOOGLE_API_KEY: str = "your_api_key"
CHUNK_SIZE: int = 1000 # Text chunk size
CHUNK_OVERLAP: int = 200 # Overlap between chunks
EMBEDDING_MODEL: str = "all-MiniLM-L6-v2" # Sentence transformer model
GEMINI_MODEL: str = "gemini-2.0-flash-exp" # Gemini model version
CHROMA_PERSIST_DIR: str = "./chroma_db" # Database directory
MAX_TOKENS: int = 8192 # Maximum tokens per request
TEMPERATURE: float = 0.7 # LLM creativity (0.0-1.0)
TOP_K_RESULTS: int = 5 # Default number of sourcesText Processing:
CHUNK_SIZE: Larger chunks = more context, fewer chunksCHUNK_OVERLAP: Higher overlap = better context continuity- Modify
pdf_processor.pyfor custom chunking logic
Embedding Model:
- Current:
all-MiniLM-L6-v2(384 dimensions, fast) - Alternative:
all-mpnet-base-v2(768 dimensions, more accurate) - Alternative:
paraphrase-multilingual-MiniLM-L12-v2(multilingual)
Gemini Models:
gemini-2.0-flash-exp: Latest experimental (fastest)gemini-1.5-flash: Stable and fastgemini-1.5-pro: More capable but slower
To use a different PDF or add multiple PDFs:
-
Add PDF to data folder:
copy "your-document.pdf" "data\"
-
Update setup_pdf.py:
# Change line ~29 in setup_pdf.py pdf_path = r"data\your-document.pdf"
-
Restart application:
python main.py
# In Python console or script
from vector_store import VectorStore
# Create vector store instance
vs = VectorStore()
# Check document count
print(vs.get_collection_count())
# Reset collection
vs.reset_collection()
# Recreate collection
vs.create_collection()For Larger Documents:
# In config.py - increase chunk size
CHUNK_SIZE = 2000
CHUNK_OVERLAP = 400
MAX_TOKENS = 16384For Better Accuracy:
# In config.py - use better embedding model
EMBEDDING_MODEL = "all-mpnet-base-v2"
TOP_K_RESULTS = 10 # More context
TEMPERATURE = 0.3 # Less creative, more factualPDF File → PyPDF2 → Raw Text → Text Cleaning → Smart Chunking
- PyPDF2: Extracts text from PDF pages
- Text Cleaning: Removes extra whitespace, normalizes text
- Smart Chunking: Splits on sentences/paragraphs when possible
Text Chunks → Tokenization → SentenceTransformer → Vector Embeddings
- Tokenization: tiktoken converts text to tokens
- SentenceTransformer: Creates 384-dimensional vectors
- Normalization: Vectors normalized for cosine similarity
Embeddings + Metadata → ChromaDB → Persistent Storage
- ChromaDB: High-performance vector database
- Metadata: Source info, chunk IDs, timestamps
- Indexing: HNSW algorithm for fast similarity search
User Question → Embedding → Similarity Search → Top-K Results
- Query Embedding: Same model as document embeddings
- Cosine Similarity: Measures semantic similarity
- Ranking: Returns most relevant chunks
Question + Context → Prompt Engineering → Gemini 2.0 → Final Answer
- Context Assembly: Combines relevant chunks
- Prompt Template: Instructs LLM on response format
- Generation: Gemini creates contextual answer
Error: [WinError 10048] Only one usage of each socket address
Solution: Port 8000 is busy
# Find process using port 8000
netstat -ano | findstr :8000
# Kill the process
taskkill /PID <process_id> /F
# Or use different port
uvicorn main:app --host 0.0.0.0 --port 8080Error: 404 models/gemini-pro is not found
Solution: Update model name in config.py
GEMINI_MODEL = "gemini-2.0-flash-exp" # Latest modelError: API key not valid
Solution: Check your .env file
# Verify API key in .env
type .env
# Get new API key from https://makersuite.google.com/app/apikeyError: No text could be extracted from the PDF
Solutions:
- Ensure PDF is not password protected
- Check if PDF contains actual text (not just images)
- Try a different PDF file
Error: sqlite3.OperationalError: database is locked
Solution: Reset database
# Stop server (Ctrl+C)
# Delete database directory
rmdir /s "chroma_db"
# Restart server
python main.pyCORS policy error or network error
Solutions:
- Ensure server is running on http://localhost:8000
- Check browser console for detailed errors
- Try opening frontend.html directly in browser
Error: Out of memory
Solutions:
- Reduce
CHUNK_SIZEin config.py - Use smaller embedding model
- Process smaller PDF files
Questions take too long to answer
Solutions:
- Reduce
TOP_K_RESULTS(fewer sources) - Use faster model:
gemini-1.5-flash - Increase
CHUNK_SIZE(fewer chunks to search)
Enable detailed logging:
# In main.py, change logging level
logging.basicConfig(level=logging.DEBUG)# Check all endpoints
curl http://localhost:8000/health
curl http://localhost:8000/
curl -X POST http://localhost:8000/reload- PDF Loading: 10-30 seconds (one-time)
- Simple Questions: 2-5 seconds
- Complex Questions: 5-10 seconds
- Database Operations: <1 second
- Memory: 2-4 GB during operation
- Storage: ~100MB for embeddings per 100-page PDF
- CPU: Moderate during embedding generation
- Document Size: Optimal for PDFs <500 pages
- Concurrent Users: 5-10 simultaneous requests
- Database Size: Up to 10,000 chunks tested
- Never commit
.envfiles to version control - Use environment variables in production
- Rotate API keys regularly
# For production use:
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4- PDFs are processed locally
- Only questions sent to Gemini API
- Vector database stored locally
✅ Prerequisites
- Python 3.8+ installed
- Google API key obtained
- PDF document in
data/folder
✅ Installation
- Dependencies installed:
pip install -r requirements.txt - Environment configured:
.envfile with API key - Server starts successfully:
python main.py
✅ Testing
- Health check passes: http://localhost:8000/health
- Frontend loads: Open
frontend.html - Questions work: Try sample questions
- API documentation accessible: http://localhost:8000/docs