A Dockerized application for processing clinical documents using the DeepPhe natural language processing pipeline at the doc-level (not patient-level). Although the output does not strictly conform to the OMOP NoteNLP schema, it provides the output of DeepPhe in a tabular format that can be easily integrated into the OMOP ETL process by selecting the appropriate fields.
DeepPhe OMOP is a clinical text processing application that:
- Processes clinical documents using the DeepPhe NLP pipeline
- Extracts cancer-related information from unstructured text
- Converts the extracted data to OMOP CDM format
- Runs in a containerized environment for easy deployment and scalability
- Docker and Docker Compose installed
- Java 11 (for building the JAR file)
- Maven (for building the project)
deepphe-omop/
├── build-and-deploy.sh # Main deployment script
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # Container definition
├── target/
│ └── deepphe-omop-0.1.0.jar # Application JAR (built by Maven)
├── src/main/resources/
│ ├── dphe-db-resources/ # Database resources
│ │ ├── neo4j/ # Neo4j graph database
│ │ └── hsqldb/ # HSQLDB relational database
│ └── pipeline/ # Pipeline configurations
│ └── OmopDocRunner.piper # Main pipeline configuration
├── data/
│ ├── input/ # Input documents directory
│ └── output/ # Output results directory
└── logs/ # Application logs
First, build the JAR file using Maven 3.8+: Requires dphe-nlp2 and dphe-onto-db2
mvn clean packagePlace your clinical documents in the input director organized by patient ID:
mkdir -p data/input
# Copy your clinical text files to data/input/patient_id/*.txtUse the provided deployment script for easy management:
# Make the script executable
chmod +x build-and-deploy.sh
# Build and run in foreground
./build-and-deploy.sh run
# Or run in background
./build-and-deploy.sh run-bg
# Process sample data (if available)
./build-and-deploy.sh sampleThe build-and-deploy.sh script provides several convenient commands:
| Command | Description |
|---|---|
build |
Build Docker image only |
run |
Build and run application (foreground) |
run-bg |
Build and run application (background) |
sample |
Process sample data and show results |
stop |
Stop all services |
status |
Show service status and useful info |
logs |
Show application logs |
cleanup |
Stop services and remove containers/volumes |
help |
Show help message |
If you prefer to use Docker directly:
# Build the image
docker compose build
# Run the application
docker compose up deepphe-omop
# Run in background
docker compose up -d deepphe-omop
# View logs
docker compose logs -f deepphe-omop
# Stop the application
docker compose down- Input: Place clinical text files in
data/input/ - Output: Processed results will appear in
data/output/ - Logs: Application logs are available in the
logs/directory and via Docker logs
The application supports the following environment variables:
JAVA_OPTS: JVM options (default:-Xms512m -Xmx2048m -XX:+UseG1GC)APP_ENV: Application environment (set todockerin container)
The Docker container is configured with:
- Memory limit: 3GB
- Memory reservation: 1GB
Adjust these limits in docker-compose.yml if needed based on your data volume and system resources.
The main pipeline configuration is located at:
src/main/resources/pipeline/OmopDocRunner.piper
This file contains paths to databases and other pipeline settings. The Docker setup automatically maps these to container-appropriate paths.
The application includes embedded databases:
- Neo4j: Graph database containing the DeepPhe knowledge base (Embedded format)
- HSQLDB: Relational database for OMOP CDM storage (Embedded format)
These databases are automatically included in the Docker image and don't require separate setup.
./build-and-deploy.sh status./build-and-deploy.sh logs# Shell into the running container
docker compose exec deepphe-omop sh
# Check container resource usage
docker stats deepphe-omop-app- JAR file not found: Ensure you've built the project with
mvn clean package - Out of memory errors: Increase memory limits in
docker-compose.yml - Empty output: Check input file format and logs for processing errors
- Permission issues: The container runs as a non-root user; ensure file permissions are correct
# Clone the repository
git clone <repository-url>
cd deepphe-omop
# Build the project
mvn clean package
# Build Docker image
docker compose build- Edit pipeline configuration in
src/main/resources/pipeline/OmopDocRunner.piper - Adjust Docker settings in
docker-compose.yml - Modify the deployment script
build-and-deploy.shfor custom workflows
- The application processes documents sequentially
- Memory usage scales with document size and complexity
- For large document sets, consider:
- Increasing memory limits
- Processing documents in batches
- Using faster storage for input/output directories
- The container runs as a non-root user (
appuser) - Input directory is mounted read-only
- No network ports are exposed by default (unless your application requires it)
For issues and questions:
- Check the application logs for error messages
- Verify input file formats match expected requirements
- Ensure adequate system resources are available
- Review the pipeline configuration for path and database issues
Apache 2.0
Please create a pull request with description for review