Skip to content

Add OpenTelemetry Observability SupportΒ #12

@tomiok

Description

@tomiok

Add OpenTelemetry Observability Support

πŸ“‹ Description

Integrate OpenTelemetry to provide comprehensive observability for Queuety message broker operations. This will enable users to monitor performance, track message flows, and debug issues using their preferred observability platform.

🎯 Problem Statement

Currently, Queuety lacks observability features, making it difficult for users to:

  • Monitor message throughput and latency
  • Track topic and subscriber metrics
  • Debug connection and persistence issues
  • Identify performance bottlenecks
  • Set up production monitoring and alerting

πŸš€ Proposed Solution

Implement a dual observability approach:

  • Prometheus for metrics collection and monitoring
  • OpenTelemetry for distributed tracing

Metrics (Prometheus)

Expose Prometheus metrics at /metrics endpoint:

  • queuety_messages_published_total{topic} - Total published messages
  • queuety_messages_delivered_total{topic} - Total delivered messages
  • queuety_messages_failed_total{topic, reason} - Failed message deliveries
  • queuety_message_processing_seconds{topic, operation} - Processing latency histogram
  • queuety_topics_total - Number of active topics
  • queuety_subscribers_total{topic} - Subscribers per topic
  • queuety_active_connections - Current TCP connections count
  • queuety_badger_operations_total{operation, status} - BadgerDB operation metrics
  • queuety_auth_attempts_total{result} - Authentication attempts tracking

Distributed Tracing (OpenTelemetry)

Export traces to OTLP-compatible backends:

  • Message lifecycle spans: publish β†’ persist β†’ deliver β†’ ack
  • BadgerDB operations: save, update, delete operations
  • Connection flows: accept, authenticate, disconnect
  • Topic management: create topic, add subscriber operations

Supported Tracing Backends

  • Jaeger - Open source distributed tracing
  • Datadog APM - Enterprise tracing and APM
  • New Relic - Full observability platform
  • OTLP Generic - Any OpenTelemetry-compatible backend (Zipkin, etc.)

πŸ“ Implementation Plan

Phase 1: Core Infrastructure

  • Create telemetry/ package structure
  • Implement base Telemetry struct and configuration
  • Add Prometheus exporter support
  • Environment variable configuration

Phase 2: Instrumentation

  • Instrument message publish/deliver operations
  • Add BadgerDB persistence metrics
  • Track TCP connection lifecycle
  • Implement distributed tracing spans

Phase 3: Multi-Backend Support

  • Add Datadog OTLP integration
  • Implement New Relic support
  • Generic OTLP exporter for other backends
  • Jaeger tracing exporter

Phase 4: Documentation & Examples

  • Docker Compose examples with Prometheus/Grafana
  • Configuration documentation
  • Grafana dashboard templates
  • Production deployment guides

πŸ”§ Technical Details

Configuration

# Environment variables
QUEUETY_TELEMETRY_ENABLED=true
QUEUETY_TELEMETRY_BACKEND=prometheus|datadog|newrelic|otlp|jaeger
QUEUETY_TELEMETRY_ENDPOINT=https://custom-endpoint
QUEUETY_TELEMETRY_API_KEY=your-api-key

Dependencies

go.opentelemetry.io/otel v1.24.0
go.opentelemetry.io/otel/exporters/prometheus v0.46.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.24.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.24.0
go.opentelemetry.io/otel/sdk v1.24.0

Code Structure

queuety/
β”œβ”€β”€ telemetry/
β”‚   β”œβ”€β”€ telemetry.go    # Main setup and providers
β”‚   └── metrics.go      # Metrics definitions and helpers
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ server.go       # Instrumented message handling
β”‚   └── persistence.go  # BadgerDB metrics
└── examples/
    └── docker-compose-prometheus.yml

βœ… Benefits

For Users

  • Production Ready: Monitor Queuety in production environments
  • Vendor Freedom: Choose any observability platform
  • Debug Capabilities: Trace message flows and identify bottlenecks
  • Performance Insights: Understand throughput and latency patterns
  • Alerting: Set up proactive monitoring and alerts

For Project

  • Enterprise Adoption: Makes Queuety suitable for production use
  • Community Growth: Observability is essential for serious deployments
  • Debugging: Easier to troubleshoot issues and performance problems
  • Competitive Advantage: Many message brokers lack comprehensive observability

🎨 Example Usage

Prometheus + Grafana

# Start with observability stack
docker-compose -f examples/docker-compose-prometheus.yml up

# View metrics
curl http://localhost:9090/metrics | grep queuety

Datadog Integration

QUEUETY_TELEMETRY_ENABLED=true \
QUEUETY_TELEMETRY_BACKEND=datadog \
QUEUETY_TELEMETRY_API_KEY=${DD_API_KEY} \
./queuety

πŸ”„ Backward Compatibility

  • Telemetry is disabled by default - zero impact on existing deployments
  • No breaking changes to existing APIs
  • Optional dependencies - OTEL libs only loaded when telemetry enabled
  • Graceful degradation - Server continues running if telemetry setup fails

πŸ“Š Success Criteria

  • Metrics are exported correctly to all supported backends
  • Distributed traces show complete message lifecycle
  • Zero performance impact when telemetry disabled
  • Documentation includes setup guides for each backend
  • Example dashboards provided for Grafana
  • CI/CD tests verify telemetry functionality

🀝 Contributing

This is a significant feature that would benefit from community input:

  • Feedback on metric names and labels
  • Additional backend support requests
  • Dashboard and alerting rule contributions
  • Documentation improvements
  • Testing on different deployment scenarios

🏷️ Labels

enhancement observability monitoring production-ready good-first-issue help-wanted

πŸ“š References


Priority: High - Observability is crucial for production message broker deployments

Effort: Large - Comprehensive feature requiring instrumentation across the codebase

Impact: High - Enables enterprise adoption and production monitoring capabilities

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions