-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Add OpenTelemetry Observability Support
π Description
Integrate OpenTelemetry to provide comprehensive observability for Queuety message broker operations. This will enable users to monitor performance, track message flows, and debug issues using their preferred observability platform.
π― Problem Statement
Currently, Queuety lacks observability features, making it difficult for users to:
- Monitor message throughput and latency
- Track topic and subscriber metrics
- Debug connection and persistence issues
- Identify performance bottlenecks
- Set up production monitoring and alerting
π Proposed Solution
Implement a dual observability approach:
- Prometheus for metrics collection and monitoring
- OpenTelemetry for distributed tracing
Metrics (Prometheus)
Expose Prometheus metrics at /metrics endpoint:
queuety_messages_published_total{topic}- Total published messagesqueuety_messages_delivered_total{topic}- Total delivered messagesqueuety_messages_failed_total{topic, reason}- Failed message deliveriesqueuety_message_processing_seconds{topic, operation}- Processing latency histogramqueuety_topics_total- Number of active topicsqueuety_subscribers_total{topic}- Subscribers per topicqueuety_active_connections- Current TCP connections countqueuety_badger_operations_total{operation, status}- BadgerDB operation metricsqueuety_auth_attempts_total{result}- Authentication attempts tracking
Distributed Tracing (OpenTelemetry)
Export traces to OTLP-compatible backends:
- Message lifecycle spans: publish β persist β deliver β ack
- BadgerDB operations: save, update, delete operations
- Connection flows: accept, authenticate, disconnect
- Topic management: create topic, add subscriber operations
Supported Tracing Backends
- Jaeger - Open source distributed tracing
- Datadog APM - Enterprise tracing and APM
- New Relic - Full observability platform
- OTLP Generic - Any OpenTelemetry-compatible backend (Zipkin, etc.)
π Implementation Plan
Phase 1: Core Infrastructure
- Create
telemetry/package structure - Implement base
Telemetrystruct and configuration - Add Prometheus exporter support
- Environment variable configuration
Phase 2: Instrumentation
- Instrument message publish/deliver operations
- Add BadgerDB persistence metrics
- Track TCP connection lifecycle
- Implement distributed tracing spans
Phase 3: Multi-Backend Support
- Add Datadog OTLP integration
- Implement New Relic support
- Generic OTLP exporter for other backends
- Jaeger tracing exporter
Phase 4: Documentation & Examples
- Docker Compose examples with Prometheus/Grafana
- Configuration documentation
- Grafana dashboard templates
- Production deployment guides
π§ Technical Details
Configuration
# Environment variables
QUEUETY_TELEMETRY_ENABLED=true
QUEUETY_TELEMETRY_BACKEND=prometheus|datadog|newrelic|otlp|jaeger
QUEUETY_TELEMETRY_ENDPOINT=https://custom-endpoint
QUEUETY_TELEMETRY_API_KEY=your-api-keyDependencies
go.opentelemetry.io/otel v1.24.0
go.opentelemetry.io/otel/exporters/prometheus v0.46.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.24.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.24.0
go.opentelemetry.io/otel/sdk v1.24.0Code Structure
queuety/
βββ telemetry/
β βββ telemetry.go # Main setup and providers
β βββ metrics.go # Metrics definitions and helpers
βββ server/
β βββ server.go # Instrumented message handling
β βββ persistence.go # BadgerDB metrics
βββ examples/
βββ docker-compose-prometheus.yml
β Benefits
For Users
- Production Ready: Monitor Queuety in production environments
- Vendor Freedom: Choose any observability platform
- Debug Capabilities: Trace message flows and identify bottlenecks
- Performance Insights: Understand throughput and latency patterns
- Alerting: Set up proactive monitoring and alerts
For Project
- Enterprise Adoption: Makes Queuety suitable for production use
- Community Growth: Observability is essential for serious deployments
- Debugging: Easier to troubleshoot issues and performance problems
- Competitive Advantage: Many message brokers lack comprehensive observability
π¨ Example Usage
Prometheus + Grafana
# Start with observability stack
docker-compose -f examples/docker-compose-prometheus.yml up
# View metrics
curl http://localhost:9090/metrics | grep queuetyDatadog Integration
QUEUETY_TELEMETRY_ENABLED=true \
QUEUETY_TELEMETRY_BACKEND=datadog \
QUEUETY_TELEMETRY_API_KEY=${DD_API_KEY} \
./queuetyπ Backward Compatibility
- Telemetry is disabled by default - zero impact on existing deployments
- No breaking changes to existing APIs
- Optional dependencies - OTEL libs only loaded when telemetry enabled
- Graceful degradation - Server continues running if telemetry setup fails
π Success Criteria
- Metrics are exported correctly to all supported backends
- Distributed traces show complete message lifecycle
- Zero performance impact when telemetry disabled
- Documentation includes setup guides for each backend
- Example dashboards provided for Grafana
- CI/CD tests verify telemetry functionality
π€ Contributing
This is a significant feature that would benefit from community input:
- Feedback on metric names and labels
- Additional backend support requests
- Dashboard and alerting rule contributions
- Documentation improvements
- Testing on different deployment scenarios
π·οΈ Labels
enhancement observability monitoring production-ready good-first-issue help-wanted
π References
- [OpenTelemetry Go Documentation](https://opentelemetry.io/docs/instrumentation/go/)
- [Prometheus Best Practices](https://prometheus.io/docs/practices/naming/)
- [OTEL Semantic Conventions](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/)
Priority: High - Observability is crucial for production message broker deployments
Effort: Large - Comprehensive feature requiring instrumentation across the codebase
Impact: High - Enables enterprise adoption and production monitoring capabilities