A private, local PDF research assistant that reads your documents aloud and answers questions about them. Upload PDFs or capture webpages, then chat with your content using AIβall running on your own machine, no subscriptions required.
- Docker and Docker Compose
- A local LLM runtime (choose one):
- Docker Model Runner (built into Docker Desktop)
- Ollama (lightweight CLI)
- LMStudio (GUI app)
π Choose your LLM setup (click to expand)
- copy .env.example file and rename it to .env file.
- Pick an LLM server from below and set "LLM_API_URL' with the url accordingly
- Enable Model Runner in Docker Desktop β Settings β Features
- Pull models:
docker model pull ai/qwen3:latest docker model pull ai/nomic-embed-text-v1.5:latest
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:12434
- Install Ollama
- Pull models:
ollama pull llama3.2 ollama pull nomic-embed-text
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:11434
- Install LMStudio
- Download a chat model and embedding model
- Start Local Server (port 1234)
- Set LLM_API_URL in
.envfile to:LLM_API_URL=http://host.docker.internal:1234/v1
docker compose up --build- Open: http://localhost:3000
- Upload PDFs or add webpages
- Click Play or select PDF text and use the read-aloud icon to hear documents aloud
- Ask questions about your content
- Text-to-Speech: High-quality voice reads your PDFs aloud
- Sentence Tracking: Visual highlighting shows what's being read
- Multiple Documents: Switch between PDFs and webpages with tabs
- PDF Annotations: Highlight, draw, and comment directly on documents
- AI Assistant: Ask questions about your uploaded content
- Smart Memory: Remembers previous conversations in each thread
- Web Search: Optionally include live internet results
- Reasoning Display: See how the AI thinks through problems
- Modern Interface: Clean, intuitive design
- Thread Organization: Keep different topics separate
- Customizable: Adjust AI behavior per conversation
- Private: Everything runs locally on your machine
- Create a Thread - Use the sidebar to start a new conversation
- Add Content - Upload PDFs or add webpage URLs
- Start Reading - Click Play to hear documents aloud
- Ask Questions - Type questions in the chat
- Play Controls: Click Play, or select PDF text and click the read-aloud icon in the selection menu
- Voice Settings: Choose different voices and adjust speed (0.5x-2.0x)
- Auto-Scroll: Document follows along automatically
- Select Model: Choose your preferred AI model
- Internet Search: Toggle to include live web results
- View Reasoning: Expand panels to see AI thinking process
- Semantic Memory: See which past conversations were used
- Thread Settings: Click βοΈ to adjust AI behavior
- System Role: Change the AI's persona
- Tool Instructions: Guide how AI uses different tools
- Custom Instructions: Add extra directions
ποΈ Architecture & Services
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββ€
β Frontend β RAG Service β Browser Captureβ PostgreSQL β Weaviate β
β (Next.js) β (FastAPI) β (Selenium) β (Primary DB) β (Vector DB) β
β Port: 3000 β Port: 8000 β Port: 8090 β Port: 5432 β Port: 8080 β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β DMR / Ollama / LMStudio / LLM β
β (OpenAI-compatible) β
β Port: 12434 (default) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
| Service | Port | Description |
|---|---|---|
| Frontend | 3000 | Next.js React app with PDF viewer, chat UI, thread management, and TTS |
| RAG Service | 8000 | FastAPI server for PDF processing, document indexing, AI chat, thread/message/file management |
| Browser Capture | 8090 | Selenium-based service for interactive webpage capture and PDF conversion |
| PostgreSQL | 5432 | Primary database for threads, messages, files, settings, and annotations |
| Weaviate | 8080 | Vector database for semantic and memory search |
| DMR/Ollama/LMStudio | 12434 | Local LLM server (external, user-provided) |
π€ Advanced AI Features
- Orchestrator Agent: LangGraph-powered agent that plans, selects tools, and synthesizes answers
- Intent Agent (optional): Pre-processes questions to improve query clarity and search precision
- Tool-Calling: Dynamic tool selection including document search, memory recall, web search, and clarification
- Configurable Iterations: Control tool-call rounds with forced final answer to prevent infinite loops
- Multi-Provider Extraction: Supports reasoning traces from Claude, OpenAI o-series, DeepSeek, QwQ, Qwen3-Thinking
- Database Storage: Reasoning traces persisted alongside answers in PostgreSQL
- UI Display: Expandable reasoning panels in chat bubbles for transparent AI thinking
- Thread-Scoped Collections: Each thread has isolated vector collections in Weaviate
- Multi-Source Retrieval: Simultaneous search across PDFs, webpages, and past conversations
- Semantic Recollection: UI highlights which past messages were used in current answers
- Context Management: Intelligent token budgeting for optimal LLM context window usage
π οΈ Technology Stack
| Technology | Purpose |
|---|---|
| FastAPI | Web framework |
| LangChain | LLM/Embedding integration |
| LangGraph | Stateful multi-agent workflow |
| Weaviate Client | Vector database operations |
| SQLModel | ORM built on SQLAlchemy |
| SQLAlchemy | Async database operations |
| Alembic | Database migration management |
| asyncpg | Async PostgreSQL driver |
| Technology | Purpose |
|---|---|
| Selenium | WebDriver automation |
| Brave Browser | Headless browser rendering |
| WeasyPrint | PDF conversion fallback |
| FastAPI | Service API framework |
| Technology | Purpose |
|---|---|
| Next.js | React framework |
| Material-UI (MUI) | UI components (v7) |
| EmbedPDF | PDF rendering with annotations |
| react-markdown | Chat message rendering |
| React Query | Async state management |
π Project Structure
askpdf/
βββ docker-compose.yml # Multi-service orchestration
βββ run_tests.sh # Comprehensive test runner
βββ browser_capture/ # Selenium-based webpage capture service
βββ rag_service/ # FastAPI backend with AI, RAG, and database
β βββ app/
β β βββ api/ # REST API route handlers
β β βββ agent/ # Multi-agent AI system
β β βββ db/ # PostgreSQL data layer
β β βββ services/ # Business logic services
β β βββ rag/ # RAG core logic
β βββ tests/ # Comprehensive test suite
βββ frontend/ # Next.js React application
βββ src/
β βββ components/ # UI components
β βββ hooks/ # React hooks
β βββ lib/ # Utility functions
βββ package.json
βοΈ Configuration & Environment
Environment variables are now managed using a .env file for better security and maintainability. The system uses two approaches:
.envfile - For user-configurable settings (models, database URLs, behavior settings)docker-compose.yml- For service-specific configuration (networking, basic service settings)
-
Copy the example file:
cp .env.example .env
-
Configure your LLM provider in
.env:# Choose your LLM provider LLM_API_URL=http://host.docker.internal:1234/v1 # LMStudio # LLM_API_URL=http://host.docker.internal:11434 # Ollama # LLM_API_URL=http://host.docker.internal:12434 # Docker Model Runner
-
Review other settings in
.envand adjust as needed for your use case.
LLM Configuration
| Variable | Default | Description |
|---|---|---|
LLM_API_URL |
(none) | External LLM server URL (Docker Model Runner/Ollama/LMStudio) |
Model Configuration
| Variable | Default | Description |
|---|---|---|
LOCAL_EMBEDDING_MODEL |
BAAI/bge-m3 |
Single local embedding model to use |
LOCAL_RERANKER_MODEL |
BAAI/bge-reranker-v2-m3 |
Single local reranker model to use |
EMBEDDING_DEVICE |
cpu |
Device for embedding models (cpu/cuda/mps) |
RERANKER_DEVICE |
cpu |
Device for reranker models (cpu/cuda/mps) |
AI Behavior & Limits
| Variable | Default | Description |
|---|---|---|
DEFAULT_TOKEN_BUDGET |
8192 |
Context window size for AI responses |
DEFAULT_MAX_ITERATIONS |
10 |
Maximum tool-call rounds for AI reasoning |
MIN_MAX_ITERATIONS |
1 |
Minimum allowed iterations |
MAX_MAX_ITERATIONS |
30 |
Maximum allowed iterations |
MAX_CUSTOM_INSTRUCTIONS_CHARS |
2000 |
Maximum custom instruction length |
MAX_SYSTEM_ROLE_CHARS |
500 |
Maximum system role description length |
MAX_TOOL_INSTRUCTION_CHARS |
500 |
Maximum tool instruction length |
INTENT_AGENT_MAX_ITERATIONS |
1 |
Maximum iterations for intent agent |
MAX_ITERATIONS_SUFFICIENT_COVERAGE |
2 |
Iteration bonus for sufficient coverage |
MAX_ITERATIONS_PROBABLY_SUFFICIENT_COVERAGE |
4 |
Iteration bonus for probable sufficient coverage |
WEB_SEARCH_ITERATION_BONUS |
2 |
Extra iterations when web search is enabled |
Document Processing (Docling)
| Variable | Default | Description |
|---|---|---|
DOCLING_DO_OCR |
True |
Enable OCR for scanned images (preserves digital text) |
DOCLING_DO_TABLE_STRUCTURE |
True |
Extract table structure from documents |
DOCLING_TABLE_MODE |
ACCURATE |
Table extraction mode (FAST/ACCURATE) |
DOCLING_FORCE_FULL_PAGE_OCR |
False |
Force full-page OCR (keep false for digital PDFs) |
DOCLING_DO_FORMULA_ENRICHMENT |
False |
Enable mathematical formula extraction |
Database Configuration
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://postgres:postgres@postgresql:5432/askpdf |
PostgreSQL connection string |
TEST_DATABASE_URL |
postgresql+asyncpg://postgres:postgres@postgresql:5432/test_askpdf |
Test database connection string |
POSTGRES_POOL_SIZE |
10 |
Database connection pool size |
POSTGRES_MAX_OVERFLOW |
20 |
Maximum additional connections beyond pool size |
Frontend Service
| Variable | Default | Description |
|---|---|---|
NEXT_PUBLIC_API_URL |
http://localhost:8000 |
RAG service API URL for frontend communication |
RAG Service - Core Configuration
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
WEAVIATE_URL |
http://weaviate:8080 |
Weaviate vector database endpoint |
WEAVIATE_HYBRID_ALPHA |
0.7 |
Hybrid search balance (0.0=pure vector, 1.0=pure keyword) |
CAPTURE_SERVICE_URL |
http://browser-capture:8080 |
Browser capture service endpoint |
-
Initial Setup: Copy
.env.exampleto.envand configure your settings:cp .env.example .env # Edit .env with your preferred settings -
Apply Changes: After modifying environment variables, restart the services:
docker compose down docker compose up --build
The Compose setup builds the frontend with npm ci inside Docker and runs the production Next.js standalone server, so users do not need Node or npm installed locally.
For frontend development with hot reload, add the dev override:
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --buildYou need a chat model with tool calling support and an embedding model:
| Runtime | Chat model example | Embedding model example |
|---|---|---|
| DMR | ai/qwen3:latest |
ai/nomic-embed-text-v1.5:latest |
| Ollama | llama3.2 |
nomic-embed-text |
| LMStudio | google/gemma-3-12b |
text-embedding-embeddinggemma-300m-qat |
π API Reference
POST /api/threads- Create new threadPOST /api/threads/{thread_id}/chat- Chat with documentsPUT /api/threads/{thread_id}/settings- Update thread settingsGET /api/threads/{thread_id}/messages- List messages
POST /api/threads/{thread_id}/files/upload- Upload PDFGET /api/threads/{thread_id}/files/{file_hash}- Get file dataGET /api/threads/{thread_id}/files/{file_hash}/status- Check processing status
GET /api/models- List available modelsGET /api/health/chat-model/{model}- Check chat model healthGET /api/health/embed-model/{model}- Check embedding model health
π§ͺ Testing
./run_tests.sh [options]The test runner is Docker-native. run_tests.sh starts an isolated
askpdf-test Compose project with its own PostgreSQL, Weaviate, network, and
volumes, so macOS, Linux, Windows with Docker/WSL, and GitHub Actions all use
the same test environment. The normal app stack can keep running while tests
run because the test services do not publish host ports.
You can also run the test container directly:
docker compose -p askpdf-test -f docker-compose.test.yml run --rm --build test-runner
docker compose -p askpdf-test -f docker-compose.test.yml run --rm --build test-runner --api
docker compose -p askpdf-test -f docker-compose.test.yml run --rm --build test-runner --group dbSet ASKPDF_TEST_PROJECT_NAME to override the isolated Compose project name.
Set ASKPDF_KEEP_TEST_CONTAINERS=1 to keep test containers and volumes after a
run for debugging.
--verbose- Verbose output--file <file>- Run specific test file--test <test>- Run a specific test inside--file--coverage- Run with coverage report--unit- Run unit and mock-based tests--db/--db-tests/--db-only- Run PostgreSQL database tests--api- Run API endpoint tests--integration- Run integration tests--schema- Run schema guardrail tests--standalone- Run standalone verification scripts--all/--all-tests- Run the full pytest suite plus standalone checks
- Database Tests: PostgreSQL operations, models, repositories
- API Tests: Endpoint testing, integration tests
- Parsing Tests: PDF processing with Docling and pdfplumber
GitHub Actions runs Docker build and test jobs on pull requests and pushes to
main. To block merges unless CI passes, configure a branch ruleset in GitHub:
- Go to Settings β Rules β Rulesets.
- Create a ruleset for
main. - Require pull requests before merging.
- Require status checks to pass.
- Select the
Docker buildandTest suitechecks from theCIworkflow. - Block force pushes and branch deletions.
Contributions are welcome! Please feel free to submit a Pull Request.
This project uses the following third-party technologies:
- Kokoro - Text-to-speech model
- spaCy - Natural language processing
- LangChain - LLM framework
- LangGraph - Stateful AI workflows
- Weaviate - Vector database
- FastAPI - Web framework
- Next.js - React framework
- hexgrad for the amazing Kokoro-82M model
- spaCy for robust NLP capabilities
- LangChain team for the excellent LLM framework
- Weaviate for the powerful vector database
- The open-source community for all the amazing tools
For questions, issues, or suggestions, please open an issue on the GitHub repository.