Skip to content

Add metrics pipeline for DocumentAI job monitoring #25

@laurencegoolsby

Description

@laurencegoolsby

Context

DocumentAI processes documents and generates job completion metrics (processing times, BDA results, quality indicators, error details). These metrics need to be collected, stored, and made queryable for operational monitoring and analytics.

Implementation

  1. Metrics Collection Pipeline:

    • Job completion events sent to SQS queue (documentai_job_completion_metrics)
    • Processor consumes queue and writes to S3 in partitioned structure: raw/date=YYYY-MM-DD/hour=HH/
    • Metrics include: job_id, trace_id, tenant_id, processing times, BDA results, document quality scores
  2. Query Infrastructure:

    • Glue table (documentai_metrics_job_status_table) with partition projection for automatic partition discovery
    • Athena workgroup (documentai_metrics) for SQL queries
    • Results stored in dedicated bucket with KMS encryption
  3. Infrastructure Components:

    • SQS queue with 14-day retention, KMS encryption
    • S3 buckets via storage module (automatic encryption, versioning, lifecycle)
    • Glue database and table with 40+ metric columns
    • IAM policies for queue access, S3 write, Athena query, Glue read

Files:

  • infra/app-docai/service/metrics.tf - Metrics infrastructure
  • Application code TBD based on platform patterns

Acceptance Criteria

  • SQS queue created with KMS encryption and 14-day retention
  • Glue database and table created with partition projection (date/hour)
  • Athena workgroup configured with encrypted results bucket
  • IAM policies grant appropriate permissions for metrics pipeline
  • Metrics bucket uses storage module with automatic encryption
  • Partition projection covers current year ±1 to +5 years
  • All resources tagged and conditional on document_data_extraction_config != null

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions