Skip to content

Commit 771b9b0

Browse files
committed
set ContentLength when forwarding requests
HTTP/2 clients (e.g. Java HttpClient with HTTP_2 version) often omit the Content-Length header since HTTP/2 uses DATA frames for body framing. When DMR's reverse proxy forwards such requests to the backend via HTTP/1.1, it uses Transfer-Encoding: chunked (ContentLength == -1), which vLLM's Python/uvicorn server fails to parse — resulting in an empty body and a 422 Unprocessable Entity response. Fix by explicitly setting ContentLength = len(body) on the upstream request after replacing the body with the already-buffered bytes. This ensures a Content-Length header is always sent, consistent with how the Ollama and Anthropic handlers already handle this. llama.cpp was unaffected because its C/C++ HTTP server handles chunked encoding gracefully. Signed-off-by: Eric Curtin <eric.curtin@docker.com>
1 parent 425d03e commit 771b9b0

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

pkg/inference/scheduling/http_handler.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,14 @@ func (h *HTTPHandler) handleOpenAIInference(w http.ResponseWriter, r *http.Reque
271271
}()
272272

273273
// Create a request with the body replaced for forwarding upstream.
274+
// Set ContentLength explicitly so the backend always receives a Content-Length
275+
// header. Without this, HTTP/2 requests (where clients may omit Content-Length)
276+
// are forwarded with Transfer-Encoding: chunked, which some backends (e.g.
277+
// vLLM's Python/uvicorn server) fail to parse, resulting in an empty body and
278+
// a 422 response.
274279
upstreamRequest := r.Clone(r.Context())
275280
upstreamRequest.Body = io.NopCloser(bytes.NewReader(body))
281+
upstreamRequest.ContentLength = int64(len(body))
276282

277283
// Perform the request.
278284
runner.ServeHTTP(w, upstreamRequest)

0 commit comments

Comments
 (0)