Skip to content

[receiver/datadogreceiver] Service check endpoint fails to unmarshal single object payloads #44986

@dmytrohysht

Description

@dmytrohysht

Component(s)

receiver/datadog

What happened?

Bug Report: Service Check Endpoint Fails to Unmarshal Single Object Payloads

Component

receiver/datadog

Describe the bug

The Datadog receiver's /api/v1/check_run endpoint fails when receiving a single service check object {...} instead of an array [{...}], causing intermittent unmarshal errors.

Steps to Reproduce

  1. Configure Datadog Agent 7.72.2 with DD_DD_URL=http://localhost:7000 to forward to OTEL collector
  2. OTEL collector runs with Datadog receiver on port 7000
  3. Datadog agent sends periodic connectivity health checks to all endpoints
  4. The /api/v1/check_run health check probe sends a single object payload

Expected behavior

The receiver should handle both:

  • Array payloads: [{...}, {...}] (normal case)
  • Single object payloads: {...} (edge case)

This pattern already exists in the codebase - the logs endpoint (handleLogs) uses defensive parsing to handle both formats gracefully.

Actual behavior

Error every ~10 minutes:

{
  "level":"error",
  "ts":"2025-12-15T19:16:49.180Z",
  "msg":"json: cannot unmarshal object into Go value of type []translator.ServiceCheck",
  "otelcol.component.id":"datadog",
  "otelcol.component.kind":"receiver",
  "otelcol.signal":"metrics"
}

Environment

  • OTEL Collector Version: v0.134.0 and v0.141.0 (tested both)
  • Datadog Agent Version: 7.72.2
  • Platform: Kubernetes (GKE)
  • Setup: Datadog Agent → OTEL Collector (Datadog receiver) → Dynatrace (OTLP exporter)

Root Cause

The handleCheckRun function in receiver/datadogreceiver/receiver.go (lines 410-417 in v0.134.0) only attempts to unmarshal as an array:

var services []translator.ServiceCheck
err = json.Unmarshal(buf.Bytes(), &services)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
    ddr.params.Logger.Error(err.Error())
    return
}

Proposed Solution

Apply the same defensive parsing pattern already used in handleLogs (lines 304-316):

// Try parsing as array first, then single service check
var services []translator.ServiceCheck
err = json.Unmarshal(buf.Bytes(), &services)
if err != nil {
    // Now try parsing as a single service check
    var service translator.ServiceCheck
    err = json.Unmarshal(buf.Bytes(), &service)
    if err != nil {
        http.Error(w, "unable to unmarshal service checks", http.StatusBadRequest)
        ddr.params.Logger.Error("unable to unmarshal service checks", zap.Error(err))
        return
    }
    services = append(services, service)
}

Verification

We tested this fix by patching v0.134.0 locally and deploying to production. The error no longer occurs.

Patch file: Available if needed for reference

Impact

  • Severity: Low (cosmetic error, no data loss)
  • Frequency: ~1% of service check transactions (every ~10 minutes)
  • Affected Users: Anyone using Datadog Agent with DD_DD_URL pointing to OTEL collector

Additional Context

  • The Datadog agent's normal service check payloads are always arrays (verified in DD agent source)
  • The single object format appears to come from DD agent's connectivity health checks
  • Setting DD_ENABLE_PAYLOADS_SERVICE_CHECKS=false does NOT resolve the issue (confirms it's from agent's internal diagnostics)
  • The fix pattern already exists in the same file for the logs endpoint

Related Documentation

Collector version

v0.134.0 and v0.141.0 (tested both)

Environment information

Environment

OpenTelemetry Collector configuration

...
receivers:
  datadog:
    endpoint: localhost:7000
    read_timeout: 60s
    write_timeout: 60s
    intake:
      behavior: proxy
      proxy:
        api:
          key: "<redacted>"
          site: "datadoghq.com"
          fail_on_invalid_key: false
...

Log output

{"level":"error","ts":"2025-12-16T17:54:19.751Z","msg":"json: cannot unmarshal object into Go value of type []translator.ServiceCheck","resource":{"service.instance.id":"df498ea6-cdcb-459c-849e-e9ca94e1de5a","service.name":"otelcol-contrib","service.version":"0.134.0"},"otelcol.component.id":"datadog","otelcol.component.kind":"receiver","otelcol.signal":"metrics"}

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions