Skip to content

[AWS] Story 5: Observability (CloudWatch) #180

@mfittko

Description

@mfittko

Summary

Implement observability: CloudWatch log groups, dashboards, alarms, and optional X-Ray tracing.

Epic: #174
Architecture: docs/architecture/planned/aws-ecs-cdk.md


Tasks

CloudWatch Logs

  • Create log group for Proxy service
  • Create log group for Dispatcher service
  • Configure log retention (14 days default, configurable)
  • Set up log metric filters for errors

CloudWatch Dashboard

  • Create dashboard with key metrics:
    • Request count and latency (ALB)
    • Error rate (4xx, 5xx)
    • ECS CPU and memory utilization
    • Aurora connections and latency
    • Redis cache hit rate
    • Task count (running vs desired)

CloudWatch Alarms

  • High error rate alarm (>5% 5xx errors)
  • High latency alarm (p99 > 2s)
  • Low healthy host count
  • High CPU utilization (>80% sustained)
  • Database connection failures
  • Configure SNS topic for notifications

X-Ray Tracing (Optional)

  • Enable X-Ray daemon sidecar (if requested)
  • Configure sampling rules
  • Document tracing setup

Configuration Props

logRetentionDays?: number;       // default: 14
enableAlarms?: boolean;          // default: true
alarmEmail?: string;             // SNS notification email
enableXRay?: boolean;            // default: false

Dashboard Widgets

Widget Metric Source
Request Rate RequestCount ALB
Latency p50/p95/p99 TargetResponseTime ALB
Error Rate HTTPCode_Target_5XX ALB
CPU Utilization CPUUtilization ECS
Memory Utilization MemoryUtilization ECS
Task Count RunningTaskCount ECS
DB Connections DatabaseConnections Aurora
Cache Hit Rate CacheHitRate Redis

Acceptance Criteria

  • Logs flow to CloudWatch with structured JSON
  • Dashboard shows all key metrics
  • Alarms trigger on error conditions
  • SNS notifications work (if configured)
  • Log retention policy applied

Dependencies

  • Story 3: Compute Layer (ECS services to monitor)
  • Story 4: Networking (ALB metrics)

Estimated Effort

Medium - 2 days


Notes

  • CloudWatch Logs ~$0.50/GB ingested
  • Dashboard ~$3/month per dashboard
  • Consider log aggregation patterns for cost optimization

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions