A FastAPI-based backend service that processes and analyzes Informed Consent documents for human subjects research. The service uses advanced language models and via Azure OpenAI Service to generate clear, understandable summaries of complex research consent documents.
- PDF document processing and analysis
- Hierarchical document parsing using LlamaIndex
- Concurrent processing of document sections
- Azure OpenAI Service integration (e.g., GPT-4o, text-embedding models) for intelligent summarization
- Specialized in bioethics and patient advocacy context
- RESTful API endpoints for document upload and processing
- Python 3.x
- Azure OpenAI Service access and credentials (see Environment Variables)
- A
.envfile in the project root to store credentials.
- Clone the repository:
git clone [repository-url]
cd irb-ki-backend # Or your project directory name- Create and activate a virtual environment (recommended):
python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
Create a
.envfile in the project root and add your Azure OpenAI credentials:
OPENAI_API_BASE=<your_OPENAI_API_BASE>
OPENAI_API_KEY=<your_OPENAI_API_KEY>
OPENAI_API_VERSION=<your_openai_api_version>
AZURE_OPENAI_DEPLOYMENT_LLM=<your_llm_deployment_name>
AZURE_OPENAI_DEPLOYMENT_EMBED=<your_embedding_deployment_name>Replace the placeholders (<...>) with your actual values. The application uses python-dotenv to load these variables automatically.
GET /: Health check endpoint- Response:
{"Hello": "World3"}
-
GET /plugins/: List all available plugins- Returns list of plugin IDs with their names and descriptions
- Users must choose one of these plugins for document processing
-
GET /plugins/{plugin_id}/: Get detailed information about a specific plugin- Returns plugin capabilities, templates, and usage instructions
POST /generate/: Process documents with explicit plugin selection- Required Parameters:
file: PDF file to processplugin_id: ID of the plugin to use (e.g., "informed-consent-ki", "clinical-protocol")
- Optional Parameters:
template_id: Specific template within the pluginparameters: JSON string with additional configuration
- Returns: JSON object with sections and texts arrays
- Required Parameters:
POST /uploadfile/: Process informed consent documents (uses "informed-consent-ki" plugin)- Accepts: PDF file upload
- Returns: JSON object containing sectioned summaries
The application uses a multi-layered approach to process documents:
- Document Ingestion: PDF documents are processed using a custom PDF reader
- Text Processing: Documents are parsed into hierarchical nodes using LlamaIndex
- Embedding & Indexing: Text is embedded using an Azure OpenAI embedding model (e.g.,
text-embedding-3-large) - Query Processing: Uses LlamaIndex with Azure OpenAI models for efficient document querying and reranking
- Summary Generation: Employs an Azure OpenAI chat model (e.g.,
gpt-4o) for generating clear, authoritative summaries
The project follows these key principles:
- Concurrent processing for improved performance
- Hierarchical document parsing for better context understanding
- Error handling and input validation
- CORS middleware for frontend integration
Key dependencies include:
- fastapi[standard] >= 0.113.0
- pydantic >= 2.7.0
- python-multipart
- pypdf
- llama-index
- llama-index-llms-azure-openai
- llama-index-embeddings-azure-openai
- python-dotenv
See requirements.txt for specific versions.
Start the FastAPI server:
uvicorn app.main:app --reloadThe server will start on http://localhost:8000 by default.