English | Chinese
An open-source, high-performance RAG (Retrieval-Augmented Generation) platform built with Python (FastAPI) and TypeScript (React), designed for document-based AI assistants.
This project aims to create a powerful RAG platform, inspired by systems like Dify, with a clean modular architecture: FastAPI for backend APIs and orchestration, and a modern React web UI for management and workflows.
- π§ Intelligent Q&A: Perform complex question-answering on your documents using a RAG pipeline.
- π Knowledge Base Management: Easily create and manage distinct knowledge bases.
- π Multi-Format Document Support: Upload and process various document formats (starting with
.txtand.md). - π Flexible API: A straightforward RESTful API for integration with any application.
- π€ Multi-Model Support: Supports DeepSeek, Qwen, and SiliconFlow APIs for different use cases.
- β‘ High-Performance Backend: FastAPI-based backend for asynchronous request handling.
- π¨ Modern Web Interface: React-based frontend with Material-UI for intuitive management.
- βοΈ Flexible Configuration: Easy model switching and configuration management.
- π Internationalization: Support for Chinese and English language switching.
The platform exposes a simple public API (x-api-key) so you can validate workflows via chat and embed the assistant into any web page.
-
Public endpoints (no login, require
x-api-key):POST /api/v1/public/chatβ non-stream chat, request body isChatRequest.POST /api/v1/public/chat/streamβ streaming chat (SSE), suitable for web embeds.POST /api/v1/public/workflows/{workflow_id}/runβ run a saved workflow (non-stream).POST /api/v1/public/workflows/{workflow_id}/run/streamβ run a saved workflow (SSE: started/progress/complete).GET /api/v1/public/workflows/{workflow_id}/io-schemaβ infer workflow-level input/output schema for integration.POST /api/v1/public/workflows/{workflow_id}/executeβ legacy alias ofrun(kept for compatibility).
-
Admin endpoints for API key management:
POST /api/v1/admin/api-keysβ create a key (scopes:chat,workflow; optionalallowed_kb,allowed_workflow_id).GET /api/v1/admin/api-keysβ list keys for your tenant.DELETE /api/v1/admin/api-keys/{id}β revoke key.
Run (non-stream):
curl -sS -X POST "https://your-host/api/v1/public/workflows/<workflow_id>/run" \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_KEY" \
-d '{"input_data":{"text":"hello"}}'Run (stream, SSE):
curl -N -X POST "https://your-host/api/v1/public/workflows/<workflow_id>/run/stream" \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_KEY" \
-d '{"input_data":{"text":"hello"},"debug":false}'Option 1: iframe
<iframe
src="https://your-host/embed.html?api_key=YOUR_KEY&kb=your_kb&api_base=https://your-host"
style="width: 100%; height: 560px; border: 1px solid #eee; border-radius: 8px"
></iframe>Option 2: fetch from your own widget
const res = await fetch('https://your-host/api/v1/public/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'x-api-key': 'YOUR_KEY' },
body: JSON.stringify({ message: 'Hello', knowledge_base_id: 'your_kb' }),
});
// Read SSE chunks from res.body and render progressively.Notes:
- Public chat supports RAG with
knowledge_base_idand will route to your tenantβs KB automatically. - Public workflow execution supports cross-tenant public workflows: an API key from tenant A can run a workflow owned by tenant B if that workflow is marked
is_public=true. - If you want to restrict cross-tenant workflow execution, bind the API key with
allowed_workflow_id. - Runtime injects execution context (
tenant_idis set to the workflow owner tenant;user_id=0) for isolation.
The system is designed with a clean separation of concerns:
- FastAPI Backend (Python): Handles all API requests, business logic, and orchestration.
- React Frontend (TypeScript): Modern web interface with Material-UI components.
- Milvus: Acts as the vector database for storing and retrieving document embeddings.
- Elasticsearch: Provides full-text search capabilities for hybrid retrieval.
- Multi-Model Support: Integrates with DeepSeek, Qwen, and SiliconFlow APIs.
You can start the whole stack with Docker Compose (recommended), or run backend/frontend locally for development.
- Python 3.9+
- Docker + Docker Compose (recommended for one-command startup)
- An available Milvus instance.
- A Dashscope API Key for the Qwen models.
Bring up backend + frontend + MySQL + Milvus + Elasticsearch:
# (Optional but recommended) create root .env for API keys and overrides
cp backend/.env.example .env
docker compose -f docker-compose.dev.yml up -d --buildAccess:
- Frontend:
http://localhost:5173 - Backend API docs:
http://localhost:8000/api/v1/docs
Useful commands:
docker compose -f docker-compose.dev.yml logs -f backend
docker compose -f docker-compose.dev.yml down
# reset volumes if you hit init errors (THIS DELETES DB/VECTOR DATA)
docker compose -f docker-compose.dev.yml down -v-
Clone the Repository
git clone <your-repo-url> cd ragJ_platform/backend
-
Configure Environment Variables Create a
.envfile in thebackend/directory by copying the example:cp .env.example .env
Now, edit the
.envfile and set your credentials:# backend/.env # Your Dashscope API Key for Qwen models DASHSCOPE_API_KEY="your_sk_key_here" # Connection details for your Milvus instance MILVUS_HOST="localhost" MILVUS_PORT="19530" -
Install Dependencies It is highly recommended to use a virtual environment.
python3 -m venv venv source venv/bin/activate pip install --upgrade pip pip install -r requirements.txt -
Run the Server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-
Access the API Once the server is running, you can access the interactive API documentation at: http://localhost:8000/docs
The platform includes a modern React-based web interface for easy management.
-
Navigate to Frontend Directory
cd frontend -
Install Dependencies
npm install
-
Start Development Server
npm run dev
-
Access Web Interface The frontend will be available at: http://localhost:5173
- π Dashboard: System overview and statistics
- π Knowledge Base Management: Create, delete, and manage knowledge bases
- π¬ Intelligent Chat: Interactive chat interface with knowledge base selection
- βοΈ Model Configuration: Easy setup for DeepSeek, Qwen, and SiliconFlow APIs
- π Document Management: Upload and manage documents (coming soon)
- π Language Support: Switch between Chinese and English interface
-
Elasticsearch index mapping updated to include
tenant_id/user_id(integer) anddocument_name/knowledge_base(keyword) fields. For existing knowledge bases created with older mappings, you can rebuild the ES index: -
POST /api/v1/knowledge-bases/{kb_name}/maintenance/rebuild-es-indexβ recreate ES index only -
POST /api/v1/knowledge-bases/{kb_name}/maintenance/rebuild-es-index?reindex=trueβ recreate and reindex documents (re-parses source files; chunking may differ from Milvus)
UPLOAD_DIRcontrols where uploaded files are stored (default/tmp/uploads).- Set
USE_CELERY=trueand configureCELERY_BROKER_URL/CELERY_RESULT_BACKENDto offload document processing to a Celery worker instead of the API process.
Here is how to use the core RAG pipeline via the API.
First, create a new knowledge base. This corresponds to a new "collection" in Milvus.
curl -X 'POST' \
'http://localhost:8000/api/v1/knowledge-bases/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "my_first_kb",
"description": "A knowledge base for testing."
}'A successful response will confirm that the knowledge base was created.
Next, upload a document (.txt or .md) to your new knowledge base. The system will process it in the background (chunking, embedding, and indexing).
Note: Make sure you have a file named sample.txt in your current directory.
curl -X 'POST' \
'http://localhost:8000/api/v1/knowledge-bases/my_first_kb/documents/' \
-H 'accept: application/json' \
-F 'file=@sample.txt;type=text/plain'The API will respond immediately, confirming that the file has been accepted for processing.
Once the document has been processed, you can start asking questions. The system will retrieve relevant context from your documents to generate an answer.
curl -X 'POST' \
'http://localhost:8000/api/v1/chat/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"message": "What is this document about?",
"knowledge_base_id": "my_first_kb",
"model": "qwen-turbo"
}'The response will contain the AI's answer, generated based on the content of the document you uploaded.
The platform supports multiple AI model providers for different use cases:
- Best for: Code generation, technical documentation
- Models:
deepseek-chat,deepseek-coder - API: https://api.deepseek.com/v1
- Best for: Chinese language tasks, comprehensive AI capabilities
- Models:
qwen-turbo,qwen-plus,qwen-max - API: https://dashscope.aliyuncs.com/compatible-mode/v1
- Best for: Cost-effective embedding and reranking
- Models: Various open-source models including BGE series
- API: https://api.siliconflow.cn/v1
The web interface provides three pre-configured setups:
-
Economic Configuration
- Chat: DeepSeek
- Embedding: SiliconFlow BGE
- Rerank: SiliconFlow BGE
-
Premium Configuration
- Chat: Qwen Max
- Embedding: Qwen Embedding
- Rerank: Qwen Rerank
-
Chinese-Optimized Configuration
- Chat: Qwen Plus
- Embedding: SiliconFlow BGE Chinese
- Rerank: SiliconFlow BGE Reranker
To configure your models:
- Visit the Settings page in the web interface
- Choose a preset or configure manually
- Add your API keys for each provider
- Test the connections
- Save the configuration
The web interface supports both Chinese and English:
- Language Switching: Click the language icon (top-right) to switch between Chinese and English
- Auto Detection: The system automatically detects your browser language preference
- Persistent Settings: Your selection is cached in browser
localStorageand remembered across sessions
- Chinese: Full interface translation for Chinese users
- English: Complete English interface for international users
All interface elements, including:
- Navigation menus
- Form labels and buttons
- Error messages and notifications
- Help text and descriptions
- Model configuration options
Are fully translated and localized for both languages.
- Formats: PDF, DOCX, TXT, Markdown, HTML
- Capabilities: text extraction, structure parsing, metadata extraction
- Chunking: smart chunking, fixed-length, semantic splitting
- Embedding models: OpenAI, Hugging Face, local models
- Vector store: indexing and retrieval optimizations
- Hybrid retrieval: semantic + keyword search
- RAG pipeline: retrieve + generate
- Model options: GPT-class models, Claude-class models, open-source LLMs
- Conversation context: multi-turn chat support
- Workflow builder: graph-based agent/workflow design
- State management: persistent execution state and checkpoints
- Multi-agent collaboration: agents can coordinate and communicate
- Conditional routing: route by conditions during execution
- Organization: hierarchical KB management
- Access control: fine-grained permissions
- Versioning: document version tracking
- Secrets: ensure no API keys/tokens are committed; use
backend/.env.exampleas the template. - Links: replace placeholder links like
https://github.com/your-org/...and any non-existent domains/emails. - Images: replace/update screenshots under
images/. - License: keep READMEβs license statement consistent with
LICENSE.
# Generate an API key
from backend.app.core.security import generate_api_key
api_key = generate_api_key(user_id="user_123")# config/permissions.yml
roles:
admin:
- knowledge_base:*
- document:*
- user:*
user:
- knowledge_base:read
- document:upload
- chat:query# .env.production
DATABASE_URL=postgresql://user:pass@db:5432/ragj_platform
REDIS_URL=redis://redis:6379/0
STORAGE_BACKEND=seaweedfs
S3_ENDPOINT=http://seaweedfs:8333
S3_ACCESS_KEY=your_access_key
S3_SECRET_KEY=your_secret_key
S3_BUCKET_NAME=ragj-documents
# LLM configuration
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1
# Security configuration
SECRET_KEY=your_super_secret_key
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
# Service configuration
API_V1_STR=/api/v1
PROJECT_NAME=RAG Platform
DEBUG=false# docker-compose.monitoring.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
grafana:
image: grafana/grafana
ports:
- "3001:3000"- Async/concurrent processing (FastAPI + background tasks)
- Batch embedding and request coalescing (lower embedding cost)
- Tune chunking and retrieval parameters (quality vs latency)
-- Vector retrieval index
CREATE INDEX idx_embeddings_vector ON document_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Metadata query indexes
CREATE INDEX idx_documents_kb_id ON documents(knowledge_base_id);
CREATE INDEX idx_chunks_doc_id ON document_chunks(document_id);- Redis for hot queries
- Embedding cache
- Document processing result cache
# Install dev dependencies
pip install -r requirements-dev.txt
# Formatting
black backend/
# Type checking
mypy backend/app/
# Tests
pytest backend/tests/# After starting the backend, visit:
http://localhost:8000/docs # Swagger UI
http://localhost:8000/redoc # ReDoc
http://localhost:8000/openapi.json # OpenAPI spec- Fork the repo
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License β see LICENSE.
- Discussions: GitHub Discussions
- Issues: GitHub Issues
- Docs:
docs/
Note: This is a baseline implementation suitable for learning and small deployments. For production use, harden security and tune performance for your workload.





