Skip to content

Fix /api{/v1}/osquery/logand /api{/v1}/osquery/distributed/write content-length exceeded errors #40813

@BCTBB

Description

@BCTBB

Fleet version: v4.81.0

Web browser and operating system: N/A


💥  Actual behavior

Since 4.81.0 was applied, we've noticed an increase in cron (usage_statistics) failure alerts to help-p1 for customer-numa. Cleaning up error:*:json and error:*:count clears the alerts until the number of keys grows again.

The error:*:json keys have the following body:

"[\n  {\n    \"message\": \"request body read error: read tcp <redacted_ip>:8080-\\u003e<redacted_ip>:40940: i/o timeout\"\n  },\n  {\n    \"message\": \"missing FleetError in chain\",\n    \"data\": {\n      \"timestamp\": \"2026-02-26T20:50:55Z\"\n    },\n    \"stack\": [\n      \"github.com/fleetdm/fleet/v4/server/platform/endpointer.EncodeError (transport_error.go:78)\",\n      \"github.com/fleetdm/fleet/v4/server/service.fleetErrorEncoder (transport_error.go:122)\",\n      \"github.com/go-kit/kit/transport/http.Server.ServeHTTP (server.go:117)\",\n      \"github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func2 (instrument_server.go:255)\",\n      \"net/http.HandlerFunc.ServeHTTP (server.go:2322)\",\n      \"github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1 (instrument_server.go:296)\",\n      \"net/http.HandlerFunc.ServeHTTP (server.go:2322)\",\n      \"github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1 (instrument_server.go:147)\",\n      \"net/http.HandlerFunc.ServeHTTP (server.go:2322)\",\n      \"github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2 (instrument_server.go:109)\",\n      \"net/http.HandlerFunc.ServeHTTP (server.go:2322)\"\n    ]\n  }\n]"

In the Fleet logs look like the following:

{"component":"http","err":"request body read error: read tcp <redacted_ip>:8080-><redacted_ip>:62524: i/o timeout","level":"info","path":"/api/v1/osquery/distributed/write","took":"25.000126271s","ts":"2026-02-24T07:45:54.817144386Z","uuid":"<redacted>"}

Also in the Fleet logs, it looks like the following error messages began to appear after 4.81.0, and likely related to the errors above.

{"component":"http","err":"Request exceeds the max size limit of 1.049MB. Configure the limit: https://fleetdm.com/docs/configuration/fleet-server-configuration#server-default-max-request-body-size","internal":"Request exceeds the max size limit of 1.049MB, Incoming Content-Length: 105.2MB","level":"info","path":"/api/osquery/log","took":"2.626835901s","ts":"2026-03-02T15:27:16.062914895Z"}

🛠️ To fix

  1. We need to remove the size limits on the following two osquery endpoints (see WithRequestBodySizeLimit in handler.go):
  • /api/{v1/}osquery/distributed/write
  • /api/{v1/}osquery/log
  1. Fix them both similar to how we fixed the /api/{v1/}osquery/carves/block endpoint in Authenticate carve block endpoint before parsing the "data" field #39353. What's the fix? It requires performing raw JSON parsing on the request body to extract the node key and authenticate it before reading the rest of the JSON body (to prevent DDoS attacks).

🧑‍💻  Steps to reproduce

These steps:

  • Have been confirmed to consistently lead to reproduction in multiple Fleet instances.
  • Describe the workflow that led to the error, but have not yet been reproduced in multiple Fleet instances.
  1. TODO
  2. TODO

🕯️ More info (optional)

  1. Additional details 🧵

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🐣 In progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions