FastAPI backend for the PDF RAG platform. This service handles authentication, PDF upload and processing, retrieval, agent responses, memory management, usage tracking, and background jobs.
This repo provides:
- JWT-based authentication
- per-user PDF upload and storage
- asynchronous PDF processing with Redis and RQ
- hybrid retrieval using embeddings and keyword search
- ReAct-style agent responses with tools and reasoning steps
- memory and chat-history management
- SQLite-backed persistence for users, chunks, memories, and usage
- JSON and streaming API responses
The companion frontend repo is pdf-rag-app.
- FastAPI app with Swagger docs at
/docs - per-user document isolation
- background ingestion pipeline for uploads
- hybrid search over persisted chunks
- streaming endpoints for RAG and agent answers
- daily usage tracking and request limits
- Docker Compose setup for API, worker, and Redis
- A client authenticates with
/registeror/login. - A PDF is uploaded to
/upload. - The API stores the file and enqueues a background job.
- The worker processes the document, creates chunks and embeddings, and persists them.
- Query endpoints retrieve relevant context from the per-user document store.
- The RAG service or agent generates an answer and returns JSON or SSE events.
app/api/routes: HTTP endpointsapp/services: route-facing service layerapp/services/internals: retrieval, agent, prompts, chunking, memory internalsapp/db: SQLite and vector-store logicapp/core: config, auth, loggingapp/models: request and response schemasapp/tools: agent tool registry and schemas
pdf-rag-backend/
|- app/
| |- api/routes/ # auth, documents, rag, agent, memory endpoints
| |- core/ # configuration, auth, logging
| |- db/ # SQLite and vector-store persistence
| |- models/ # Pydantic schemas
| |- services/ # service facades and job orchestration
| |- services/internals/ # retrieval, agent, memory, prompt logic
| |- tools/ # agent tool definitions
| |- main.py # FastAPI application factory
| |- queue.py # Redis/RQ helpers
|- main.py # process entrypoint
|- worker.py # RQ worker entrypoint
|- compose.yaml # local multi-service runtime
|- Dockerfile # production image
|- test_*.py # tests
- Python 3.13
- FastAPI
- Uvicorn and Gunicorn
- OpenAI SDK
- FAISS
- SQLite
- Redis + RQ
- PyPDF
Copy .env.example to .env.
| Variable | Required | Default | Purpose |
|---|---|---|---|
OPENAI_API_KEY |
Yes | - | OpenAI API access |
MODEL_NAME |
No | gpt-4o-mini |
LLM used for generation |
UPLOAD_DIR |
No | uploaded/ |
Uploaded files directory |
CACHE_DIR |
No | cache/ |
Cache directory |
PERSISTENCE_DIR |
No | persistence/ |
Persistence root |
SQLITE_DB_PATH |
No | persistence/app.db |
SQLite database file |
JWT_SECRET_KEY |
Yes for production | change-me-in-production |
JWT signing secret |
JWT_ALGORITHM |
No | HS256 |
JWT algorithm |
ACCESS_TOKEN_EXPIRE_MINUTES |
No | 1440 |
Access token lifetime |
REDIS_URL |
Yes for uploads | redis://localhost:6379 |
Redis connection string |
RQ_QUEUE_NAME |
No | default |
RQ queue name |
docker compose up --buildThis starts:
- API on
http://localhost:8000 - Swagger docs on
http://localhost:8000/docs - worker process for upload jobs
- Redis on
localhost:6379
Install dependencies:
uv syncRun the API:
uv run uvicorn main:app --reloadRun the worker in another terminal:
uv run python worker.pyMake sure Redis is running before testing uploads.
app/queue.py guards against RQ import issues on native Windows. If the worker fails locally, use Docker, WSL, or Linux for the API/worker runtime.
POST /registerPOST /loginGET /me
POST /uploadGET /documentsDELETE /documentsGET /job/{job_id}GET /status
POST /askPOST /ask-stream
POST /agentPOST /agent-stream
GET /memory/statsGET /memory/infoDELETE /memory/chatDELETE /memory/allPOST /memory/cleanup
GET /health
- SQLite stores users, document chunks, memories, chat history, and usage
- uploaded files are written under
UPLOAD_DIR/<username>/ - chunk embeddings are persisted in SQLite and loaded into per-user FAISS indexes
- Redis stores queue and job state for asynchronous ingestion
Run the full test suite:
uv run pytestExamples of focused tests already in the repo:
test_documents_route.pytest_upload_background_task.pytest_auth_tokens.pytest_usage_tracking.py
The Dockerfile runs the API with Gunicorn and Uvicorn workers.
This repo also includes:
railway.tomlrailway.env.exampleRAILWAY_DEPLOYMENT.md
Before deploying to production:
- set a real
JWT_SECRET_KEY - tighten CORS in
app/core/config.py - provision Redis for upload jobs
- persist upload and SQLite storage
- Frontend:
C:\development\pdf-rag-app