IRB Key Information Summary Backend

A FastAPI-based backend service that processes and analyzes Informed Consent documents for human subjects research. The service uses advanced language models and via Azure OpenAI Service to generate clear, understandable summaries of complex research consent documents.

Features

PDF document processing and analysis
Hierarchical document parsing using LlamaIndex
Concurrent processing of document sections
Azure OpenAI Service integration (e.g., GPT-4o, text-embedding models) for intelligent summarization
Specialized in bioethics and patient advocacy context
RESTful API endpoints for document upload and processing

Prerequisites

Python 3.x
Azure OpenAI Service access and credentials (see Environment Variables)
A .env file in the project root to store credentials.

Installation

Clone the repository:

git clone [repository-url]
cd irb-ki-backend # Or your project directory name

Create and activate a virtual environment (recommended):

python -m venv venv
# On Windows: venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the project root and add your Azure OpenAI credentials:

OPENAI_API_BASE=<your_OPENAI_API_BASE>
OPENAI_API_KEY=<your_OPENAI_API_KEY>
OPENAI_API_VERSION=<your_openai_api_version>
AZURE_OPENAI_DEPLOYMENT_LLM=<your_llm_deployment_name>
AZURE_OPENAI_DEPLOYMENT_EMBED=<your_embedding_deployment_name>

Replace the placeholders (<...>) with your actual values. The application uses python-dotenv to load these variables automatically.

API Endpoints

Root Endpoint

GET /: Health check endpoint
Response: {"Hello": "World3"}

Plugin Discovery

GET /plugins/: List all available plugins
- Returns list of plugin IDs with their names and descriptions
- Users must choose one of these plugins for document processing
GET /plugins/{plugin_id}/: Get detailed information about a specific plugin
- Returns plugin capabilities, templates, and usage instructions

Document Generation

POST /generate/: Process documents with explicit plugin selection
- Required Parameters:
  - file: PDF file to process
  - plugin_id: ID of the plugin to use (e.g., "informed-consent-ki", "clinical-protocol")
- Optional Parameters:
  - template_id: Specific template within the plugin
  - parameters: JSON string with additional configuration
- Returns: JSON object with sections and texts arrays

Legacy Endpoints (Backward Compatibility)

POST /uploadfile/: Process informed consent documents (uses "informed-consent-ki" plugin)
- Accepts: PDF file upload
- Returns: JSON object containing sectioned summaries

Technical Architecture

The application uses a multi-layered approach to process documents:

Document Ingestion: PDF documents are processed using a custom PDF reader
Text Processing: Documents are parsed into hierarchical nodes using LlamaIndex
Embedding & Indexing: Text is embedded using an Azure OpenAI embedding model (e.g., text-embedding-3-large)
Query Processing: Uses LlamaIndex with Azure OpenAI models for efficient document querying and reranking
Summary Generation: Employs an Azure OpenAI chat model (e.g., gpt-4o) for generating clear, authoritative summaries

Development

The project follows these key principles:

Concurrent processing for improved performance
Hierarchical document parsing for better context understanding
Error handling and input validation
CORS middleware for frontend integration

Dependencies

Key dependencies include:

fastapi[standard] >= 0.113.0
pydantic >= 2.7.0
python-multipart
pypdf
llama-index
llama-index-llms-azure-openai
llama-index-embeddings-azure-openai
python-dotenv

See requirements.txt for specific versions.

Running the Application

Start the FastAPI server:

uvicorn app.main:app --reload

The server will start on http://localhost:8000 by default.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.claude		.claude
app		app
sessions		sessions
test_data		test_data
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
SYSTEM_OVERVIEW.md		SYSTEM_OVERVIEW.md
TECHNICAL_DEBT_ANALYSIS.md		TECHNICAL_DEBT_ANALYSIS.md
TECHNICAL_DEBT_INDEX.md		TECHNICAL_DEBT_INDEX.md
TECHNICAL_DEBT_SUMMARY.txt		TECHNICAL_DEBT_SUMMARY.txt
consistency_report.json		consistency_report.json
requirements.txt		requirements.txt
run_test.py		run_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRB Key Information Summary Backend

Features

Prerequisites

Installation

API Endpoints

Root Endpoint

Plugin Discovery

Document Generation

Legacy Endpoints (Backward Compatibility)

Technical Architecture

Development

Dependencies

Running the Application

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IRB Key Information Summary Backend

Features

Prerequisites

Installation

API Endpoints

Root Endpoint

Plugin Discovery

Document Generation

Legacy Endpoints (Backward Compatibility)

Technical Architecture

Development

Dependencies

Running the Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages