Context
DocumentAI processes documents and generates job completion metrics (processing times, BDA results, quality indicators, error details). These metrics need to be collected, stored, and made queryable for operational monitoring and analytics.
Implementation
-
Metrics Collection Pipeline:
- Job completion events sent to SQS queue (
documentai_job_completion_metrics)
- Processor consumes queue and writes to S3 in partitioned structure:
raw/date=YYYY-MM-DD/hour=HH/
- Metrics include: job_id, trace_id, tenant_id, processing times, BDA results, document quality scores
-
Query Infrastructure:
- Glue table (
documentai_metrics_job_status_table) with partition projection for automatic partition discovery
- Athena workgroup (
documentai_metrics) for SQL queries
- Results stored in dedicated bucket with KMS encryption
-
Infrastructure Components:
- SQS queue with 14-day retention, KMS encryption
- S3 buckets via storage module (automatic encryption, versioning, lifecycle)
- Glue database and table with 40+ metric columns
- IAM policies for queue access, S3 write, Athena query, Glue read
Files:
infra/app-docai/service/metrics.tf - Metrics infrastructure
- Application code TBD based on platform patterns
Acceptance Criteria
- SQS queue created with KMS encryption and 14-day retention
- Glue database and table created with partition projection (date/hour)
- Athena workgroup configured with encrypted results bucket
- IAM policies grant appropriate permissions for metrics pipeline
- Metrics bucket uses storage module with automatic encryption
- Partition projection covers current year ±1 to +5 years
- All resources tagged and conditional on
document_data_extraction_config != null
Context
DocumentAI processes documents and generates job completion metrics (processing times, BDA results, quality indicators, error details). These metrics need to be collected, stored, and made queryable for operational monitoring and analytics.
Implementation
Metrics Collection Pipeline:
documentai_job_completion_metrics)raw/date=YYYY-MM-DD/hour=HH/Query Infrastructure:
documentai_metrics_job_status_table) with partition projection for automatic partition discoverydocumentai_metrics) for SQL queriesInfrastructure Components:
Files:
infra/app-docai/service/metrics.tf- Metrics infrastructureAcceptance Criteria
document_data_extraction_config != null