This directory contains example configuration files for Documentor.
- config.yaml: Main configuration file for PDF and DOCX parsers
- ocr_config.yaml: OCR service configuration (Dots.OCR)
- llm_config.yaml: LLM service configuration (for future use)
See also: ../env.example for environment variables configuration (API keys, secrets).
Copy the example config files to your project and customize them:
from documentor.processing.parsers.pdf import PdfParser
# Use custom config file
parser = PdfParser(config_path="/path/to/your/config.yaml")Pass configuration directly as a dictionary:
from documentor.processing.parsers.pdf import PdfParser
# Use custom config dictionary
parser = PdfParser(config_dict={
"pdf_parser": {
"layout_detection": {
"render_scale": 3.0
},
"processing": {
"skip_title_page": True
}
}
})If you don't specify config_path or config_dict, parsers will use the default internal configuration:
from documentor.processing.parsers.pdf import PdfParser
# Uses default internal config
parser = PdfParser()When both config_path and config_dict are provided, config_dict takes priority.
layout_detection: OCR rendering and layout detection settingsprocessing: Document processing optionsfiltering: Element filtering (headers, footers)table_parsing: Table detection and parsingheader_analysis: Header level determination
layout_detection: OCR rendering settingsprocessing: Document processing optionsscanned_detection: Scanned document detection thresholdshierarchy: Text block merging settings
For sensitive configuration (API keys, secrets), use environment variables in a .env file.
- Copy the example:
cp examples/env.example .env - Edit
.envand fill in your actual values - Never commit
.envto version control
See ../env.example for all available environment variables.
- Environment variables (
.envfile) - for secrets and sensitive data - Config files (
config.yaml,ocr_config.yaml) - for non-secret settings - Default values - fallback
- Configuration files are YAML format
- All settings are optional - missing values will use defaults
- You can override only specific sections - other sections will use defaults
- Internal config files are kept for backward compatibility but should not be modified
- Use
.envfile for API keys and secrets, config files for other settings