-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Component(s)
receiver/datadog
What happened?
Bug Report: Service Check Endpoint Fails to Unmarshal Single Object Payloads
Component
receiver/datadog
Describe the bug
The Datadog receiver's /api/v1/check_run endpoint fails when receiving a single service check object {...} instead of an array [{...}], causing intermittent unmarshal errors.
Steps to Reproduce
- Configure Datadog Agent 7.72.2 with
DD_DD_URL=http://localhost:7000to forward to OTEL collector - OTEL collector runs with Datadog receiver on port 7000
- Datadog agent sends periodic connectivity health checks to all endpoints
- The
/api/v1/check_runhealth check probe sends a single object payload
Expected behavior
The receiver should handle both:
- Array payloads:
[{...}, {...}](normal case) - Single object payloads:
{...}(edge case)
This pattern already exists in the codebase - the logs endpoint (handleLogs) uses defensive parsing to handle both formats gracefully.
Actual behavior
Error every ~10 minutes:
{
"level":"error",
"ts":"2025-12-15T19:16:49.180Z",
"msg":"json: cannot unmarshal object into Go value of type []translator.ServiceCheck",
"otelcol.component.id":"datadog",
"otelcol.component.kind":"receiver",
"otelcol.signal":"metrics"
}Environment
- OTEL Collector Version: v0.134.0 and v0.141.0 (tested both)
- Datadog Agent Version: 7.72.2
- Platform: Kubernetes (GKE)
- Setup: Datadog Agent → OTEL Collector (Datadog receiver) → Dynatrace (OTLP exporter)
Root Cause
The handleCheckRun function in receiver/datadogreceiver/receiver.go (lines 410-417 in v0.134.0) only attempts to unmarshal as an array:
var services []translator.ServiceCheck
err = json.Unmarshal(buf.Bytes(), &services)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
ddr.params.Logger.Error(err.Error())
return
}Proposed Solution
Apply the same defensive parsing pattern already used in handleLogs (lines 304-316):
// Try parsing as array first, then single service check
var services []translator.ServiceCheck
err = json.Unmarshal(buf.Bytes(), &services)
if err != nil {
// Now try parsing as a single service check
var service translator.ServiceCheck
err = json.Unmarshal(buf.Bytes(), &service)
if err != nil {
http.Error(w, "unable to unmarshal service checks", http.StatusBadRequest)
ddr.params.Logger.Error("unable to unmarshal service checks", zap.Error(err))
return
}
services = append(services, service)
}Verification
We tested this fix by patching v0.134.0 locally and deploying to production. The error no longer occurs.
Patch file: Available if needed for reference
Impact
- Severity: Low (cosmetic error, no data loss)
- Frequency: ~1% of service check transactions (every ~10 minutes)
- Affected Users: Anyone using Datadog Agent with
DD_DD_URLpointing to OTEL collector
Additional Context
- The Datadog agent's normal service check payloads are always arrays (verified in DD agent source)
- The single object format appears to come from DD agent's connectivity health checks
- Setting
DD_ENABLE_PAYLOADS_SERVICE_CHECKS=falsedoes NOT resolve the issue (confirms it's from agent's internal diagnostics) - The fix pattern already exists in the same file for the logs endpoint
Related Documentation
- Datadog Service Checks API: https://docs.datadoghq.com/api/latest/service-checks/
- OTEL Datadog Receiver: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/datadogreceiver
Collector version
v0.134.0 and v0.141.0 (tested both)
Environment information
Environment
OpenTelemetry Collector configuration
...
receivers:
datadog:
endpoint: localhost:7000
read_timeout: 60s
write_timeout: 60s
intake:
behavior: proxy
proxy:
api:
key: "<redacted>"
site: "datadoghq.com"
fail_on_invalid_key: false
...Log output
{"level":"error","ts":"2025-12-16T17:54:19.751Z","msg":"json: cannot unmarshal object into Go value of type []translator.ServiceCheck","resource":{"service.instance.id":"df498ea6-cdcb-459c-849e-e9ca94e1de5a","service.name":"otelcol-contrib","service.version":"0.134.0"},"otelcol.component.id":"datadog","otelcol.component.kind":"receiver","otelcol.signal":"metrics"}Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.