Skip to content

0Nguyen0Cong0Tuan0/arxiv-insight-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArXiv Insight Engine

Python FastAPI LangChain License

An Advanced Multimodal RAG System with Voice Interaction and Real-time Monitoring

AI-Powered Research Assistant for Exploring Academic Papers

FeaturesArchitectureInstallationUsage

Overview

ArXiv Insight Engine is a cutting-edge Multimodal Retrieval-Augmented Generation (RAG) system designed to revolutionize how researchers interact with academic papers. Built with state-of-the-art AI technologies, it combines document processing, intelligent routing, voice interaction, and comprehensive monitoring to create a powerful research assistant.

Key features

Feature Description
Intelligent Query Routing Automatically classifies queries and routes them to specialized processing nodes
Voice-First Interface Complete speech-to-text and text-to-speech integration
Multimodal Understanding Analyzes figures, tables, and visual content from papers
Real-time Monitoring Track costs, latency, and performance metrics
Papers Management Browse, view, and manage your research paper database
Hybrid Retrieval Combines dense vector search with BM25 for optimal results

Technology stack

Backend:
  Framework: FastAPI 0.110+
  ASGI Server: Uvicorn
  Python: 3.11+
  
AI/ML Stack:
  LLM Framework: LangChain 0.1+
  Orchestration: LangGraph
  LLM: Meta Llama 3.3 70B Instruct (via HuggingFace)
  Embeddings: all-MiniLM-L6-v2 (SentenceTransformers)

Vector Database:
  Primary: ChromaDB (persistent)
  Search Algorithm: HNSW with cosine similarity
  
Document Processing:
  Parser: Unstructured.io
  Text Splitter: LangChain RecursiveCharacterTextSplitter
  
Retrieval:
  Hybrid: ChromaDB (dense) + BM25 (sparse)
  Fusion: Reciprocal Rank Fusion (RRF)

Voice Processing:
  STT: OpenAI Whisper (base/medium/large)
  TTS: Google Text-to-Speech (gTTS)
  Audio Format: WAV, MP3, WebM, M4A
  
Vision:
  Image Captioning: Salesforce BLIP-Large
  Processing: PIL, Base64 encoding
  
Text:
  Summarization: Facebook BART-Large-CNN
  Tokenization: HuggingFace Transformers

UI Framework: Vanilla JavaScript
Styling: Custom CSS with CSS Variables
Icons: Lucide Icons
Charts: Chart.js
Components:
  - Dynamic modals
  - Real-time updates
  - Drag-and-drop upload
  - Voice recording interface

Metrics:
  Storage: JSONL file-based persistence
  Tracking: Custom metrics_tracker module
  Visualization: Chart.js + Plotly
  
Optional:
  LangSmith: For advanced tracing
  LangChain Callbacks: Operation logging

Architecture

System architecture

┌─────────────────────────────────────────────────────────────┐
│                     Frontend (Browser)                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────┐    │
│  │   Chat   │  │  Voice   │  │  Papers  │  │ Metrics   │    │
│  │    UI    │  │  Panel   │  │  Manager │  │ Dashboard │    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └─────┬─────┘    │
└───────┼─────────────┼─────────────┼──────────────┼──────────┘
        │             │             │              │
        └─────────────┴─────────────┴──────────────┘
                            │
                    ┌───────▼────────┐
                    │   FastAPI      │
                    │   Backend      │
                    └───────┬────────┘
                            │
        ┌───────────────────┼─────────────────┐
        │                   │                 │
   ┌────▼─────┐      ┌──────▼─────┐     ┌─────▼────┐
   │  Voice   │      │  LangGraph │     │ ChromaDB │
   │ Handler  │      │   Agent    │     │  Store   │
   └──────────┘      └──────┬─────┘     └──────────┘
                            │
        ┌───────────────────┼─────────────────┐
        │                   │                 │
   ┌────▼─────┐      ┌──────▼─────┐     ┌─────▼────┐
   │ Whisper  │      │   Router   │     │   BM25   │
   │   STT    │      │    Node    │     │ Retriever│
   └──────────┘      └──────┬─────┘     └──────────┘
                            │
        ┌───────────────────┼────────────────┐
        │        │          │       │        │
   ┌────▼───┐ ┌──▼──┐ ┌─────▼─┐ ┌───▼───┐ ┌──▼──┐
   │Simple  │ │Sum- │ │Compare│ │Analyze│ │Fact │
   │  Q&A   │ │mary │ │       │ │       │ │Check│
   └────────┘ └─────┘ └───────┘ └───────┘ └─────┘

LangGraph workflow

graph TD
    A[User Query] --> B[Router Node]
    B --> C{Query Type?}
    C -->|Simple Q&A| D[Retrieve Chunks]
    C -->|Summarization| E[Retrieve + Summarize]
    C -->|Comparison| F[Retrieve + Compare]
    C -->|Analysis| G[Retrieve + Analyze]
    C -->|Fact Check| H[Retrieve + Verify]
    
    D --> I[Simple QA Node]
    E --> J[Summarizer Node]
    F --> K[Comparison Node]
    G --> L[Analyzer Node]
    L --> M{Needs Figures?}
    M -->|Yes| N[Visual Analyzer]
    M -->|No| O[End]
    H --> P[Fact Checker Node]
    
    I --> O
    J --> O
    K --> O
    N --> O
    P --> O
Loading

Installation

Prerequisites

# System Requirements
- Python 3.11+
- 8GB+ RAM (16GB recommended)
- 10GB+ disk space
- CUDA-capable GPU (optional, for faster processing)

# System Dependencies
- ffmpeg (for audio processing)
- git

Set up

git clone https://github.com/0Nguyen0Cong0Tuan0/arxiv-insight-engine.git
cd arxiv-insight-engine

# Using venv
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Linux/Mac

# Install all requirements
pip install -r requirements.txt

# Install ffmpeg (required for audio)
# Ubuntu/Debian:
sudo apt-get install ffmpeg

# macOS:
brew install ffmpeg

# Windows:
# Download from https://ffmpeg.org/download.html

# Create .env file
cp .env.example .env

# Edit .env with your API keys
nano .env

Required environment variables

# Hugging Face API Token (Required)
HUGGINGFACEHUB_API_TOKEN=your_token_here

# Optional (for advanced features)
LANGCHAIN_API_KEY=your_langsmith_key
LANGCHAIN_TRACING_V2=false

Initialize Database

# Create necessary directories
mkdir -p data/raw_papers data/processed chroma_db

# Initialize ChromaDB (automatically on first run)
python -c "from src.stores.vector_store import init_collection; init_collection()"

Run the Application

# Start the FastAPI server
uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000

# Access the application
# Main UI: http://localhost:8000
# Metrics Dashboard: http://localhost:8000/metrics.html
# API Docs: http://localhost:8000/docs

Result

Home image

Metric Dashboard image image

Usage

Upload PDF papers

# Via Web UI
1. Click "Upload" tab
2. Drag & drop PDFs or click to browse
3. Click "Upload & Process"
4. Wait for processing to complete

# Via API
curl -X POST "http://localhost:8000/api/ingest/upload" \
  -F "files=@paper1.pdf" \
  -F "files=@paper2.pdf"

Search ArXiv papers

# Via Web UI
1. Click "Search" tab
2. Enter query (e.g., "Large Language Models")
3. Select papers from results
4. Click "Ingest Selected Papers"

# Via API
curl -X POST "http://localhost:8000/api/arxiv/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "RAG systems", "max_results": 10}'

Ask questions (Text)

# Via Web UI
1. Type question in chat input
2. Press Enter or click Send
3. View response with sources

# Via API
curl -X POST "http://localhost:8000/api/query/text" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is retrieval augmented generation?"}'

Use voice assistant

# Via Web UI
1. Click "Voice Toggle" in header to enable
2. Click "Record Voice Query"
3. Speak your question
4. Click "Stop Recording"
5. Listen to response (auto-play enabled)

Manage papers

# Via Web UI
1. Click "Manage Papers" in header
2. View all indexed papers
3. Select papers to delete
4. Click "Delete Selected"

# Via API
# List papers
curl http://localhost:8000/api/papers/list

# Delete papers
curl -X DELETE http://localhost:8000/api/papers/delete \
  -H "Content-Type: application/json" \
  -d '["paper_id1", "paper_id2"]'

View metrics

# Via Web UI
1. Click "View Metrics Dashboard" link
2. Select time range (1h, 6h, 24h, 7d)
3. View charts and statistics
4. Export metrics if needed

# Via API
curl http://localhost:8000/api/metrics/dashboard?hours=24

API documentation

Document ingestion

POST /api/ingest/upload
Content-Type: multipart/form-data

files: PDF file(s)

Response:
{
  "success": true,
  "message": "Processed 2 files",
  "files": [
    {
      "filename": "paper.pdf",
      "paper_id": "paper",
      "chunks_added": 150
    }
  ]
}

ArXiv integration

POST /api/arxiv/search
Content-Type: application/json

{
  "query": "neural networks",
  "max_results": 10
}

Response:
{
  "success": true,
  "count": 10,
  "results": [...]
}
POST /api/arxiv/ingest
Content-Type: application/json

{
  "paper_ids": ["2301.00001", "2301.00002"]
}

Response:
{
  "success": true,
  "papers": [...],
  "successful": 2,
  "failed_count": 0
}

Query processing

POST /api/query/text
Content-Type: application/json

{
  "query": "Explain transformer architecture",
  "image_base64": null  // optional
}

Response:
{
  "response": "The transformer architecture...",
  "sources": [
    {
      "paper_id": "paper_123",
      "content": "..."
    }
  ],
  "image_caption": null
}

Voice operations

POST /api/voice/transcribe
Content-Type: multipart/form-data

audio: audio file (WAV, MP3, WebM, M4A)

Response:
{
  "success": true,
  "text": "What is attention mechanism",
  "latency": 2.5
}
POST /api/voice/synthesize
Content-Type: application/x-www-form-urlencoded

text=Hello world&lang=en

Response: audio/mpeg stream
POST /api/voice/query
Content-Type: multipart/form-data

audio: audio file

Response:
{
  "success": true,
  "transcribed_text": "What is RAG",
  "response_text": "RAG stands for...",
  "audio_base64": "...",
  "latency": 10.5,
  "route": "simple_qa"
}

Papers management

GET /api/papers/list

Response:
{
  "success": true,
  "papers": [
    {
      "paper_id": "paper_123",
      "title": "Paper Title",
      "chunk_count": 150,
      "total_size": 125000
    }
  ],
  "total_papers": 5,
  "total_chunks": 750
}
DELETE /api/papers/delete
Content-Type: application/json

["paper_id1", "paper_id2"]

Response:
{
  "success": true,
  "message": "Deleted 2 papers",
  "chunks_deleted": 300
}

Metrics

GET /api/metrics/summary?hours=24

Response:
{
  "success": true,
  "data": {
    "total_operations": 150,
    "success_rate": 98.5,
    "avg_latency": 3.2,
    "total_cost": 0.0245
  }
}
GET /api/metrics/dashboard?hours=24

Response:
{
  "success": true,
  "summary": {...},
  "operations": {...},
  "recent_errors": [...],
  "insights": {...}
}

Project structure

arxiv-insight-engine/
├── src/
│   ├── agents/
│   │   ├── nodes/              # LangGraph processing nodes
│   │   │   ├── analyzer.py     # Deep analysis node
│   │   │   ├── comparison.py   # Comparison analysis
│   │   │   ├── fact_checker.py # Fact verification
│   │   │   ├── retriever.py    # Hybrid retrieval
│   │   │   ├── router.py       # Query classification
│   │   │   ├── simple_qa.py    # Q&A processing
│   │   │   ├── summarizer.py   # Summarization
│   │   │   ├── synthesizer.py  # Cross-paper synthesis
│   │   │   └── visual_analyzer.py # Figure analysis
│   │   ├── tools/
│   │   │   ├── hybrid_retriever.py  # Vector + BM25
│   │   │   ├── image_captioner.py   # BLIP captioning
│   │   │   └── summarizer.py        # BART summarization
│   │   └── graph.py            # LangGraph orchestration
│   ├── app/
│   │   ├── static/
│   │   │   ├── css/
│   │   │   │   ├── variables.css    # CSS variables
│   │   │   │   ├── base.css         # Base styles
│   │   │   │   ├── components.css   # UI components
│   │   │   │   ├── animations.css   # Animations
│   │   │   │   ├── voice.css        # Voice UI
│   │   │   │   └── papers.css       # Papers manager
│   │   │   └── js/
│   │   │       ├── api.js           # API client
│   │   │       ├── chat.js          # Chat interface
│   │   │       ├── main.js          # Main app logic
│   │   │       ├── search.js        # ArXiv search
│   │   │       ├── state.js         # State management
│   │   │       ├── upload.js        # File upload
│   │   │       ├── voice.js         # Voice assistant
│   │   │       └── papers.js        # Papers manager
│   │   ├── templates/
│   │   │   ├── index.html           # Main UI
│   │   │   └── metrics.html         # Metrics dashboard
│   │   ├── main.py                  # FastAPI application
│   │   └── voice_handler.py         # Voice processing
│   ├── embeddings/
│   │   └── embedder.py              # Embedding generation
│   ├── ingest/
│   │   ├── loader/
│   │   │   └── arxiv_loader.py      # ArXiv integration
│   │   ├── parser/
│   │   │   └── multimodal_parser.py # PDF parsing
│   │   ├── pipeline.py              # Ingestion pipeline
│   │   └── processor.py             # Document processing
│   ├── models/
│   │   ├── document.py              # Document models
│   │   └── request.py               # API models
│   ├── monitoring/
│   │   └── metrics_tracker.py       # Metrics collection
│   └── stores/
│       ├── feedback_store.py        # User feedback
│       └── vector_store.py          # ChromaDB interface
├── chroma_db/                       # Vector database
├── config.py                        # Configuration
├── requirements.txt                 # Dependencies
└── README.md                        # This file

About

AI-powered research assistant with voice interaction, multimodal RAG, and intelligent routing. Explore academic papers effortlessly using LLMs & ChromaDB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors