fix(community): fix real-time streaming buffer bug for DeepInfra LLM Inference Provider by Sumitk99 · Pull Request #559 · langchain-ai/langchain-community

Sumit Kumar (Sumitk99) · 2026-02-26T05:28:31Z

Description:
In DeepInfra (langchain_community/llms/deepinfra.py), the _stream() and
_astream() methods called response.text / await response.text() before
iterating over the SSE stream. This consumed the entire HTTP response body
upfront — so all tokens were buffered in memory and delivered at once. It looked
like streaming on the surface, but every chunk arrived with Δ0.000s between
them (i.e. a bulk response in disguise).
Root cause:

# Buggy — reads the entire body, stream is now exhausted
response_text = response.text
self._handle_body_errors(response_text)
self._handle_status(response.status_code, response.text)
for line in _parse_stream(response.iter_lines()):  # nothing left to iterate
    ...

Fix:

Pass stream=True to request.post() to enable HTTP-level chunked transfer
Remove the pre-iteration response.text call; check HTTP status without reading the body
Move server-side error detection inside the per-chunk loop

# Fixed — status checked without consuming body; chunks arrive in real-time
response = request.post(url=..., data=..., stream=True)
self._handle_status(response.status_code, "")
for line in _parse_stream(response.iter_lines()):
    if line and "error" in line:
        self._handle_body_errors(line)
    ...

Issue: N/A
Dependencies: None — no new dependencies introduced.

…nd ChatDeepInfra, the code in main branch starts streaming after it has consumed all tokens, it looks file streaming but it doesn't, fixed it now

fix(community): fix real-time streaming buffer bug in DeepInfra LLM a…

a82458b

…nd ChatDeepInfra, the code in main branch starts streaming after it has consumed all tokens, it looks file streaming but it doesn't, fixed it now

github-actions bot added the fix label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(community): fix real-time streaming buffer bug for DeepInfra LLM Inference Provider#559

fix(community): fix real-time streaming buffer bug for DeepInfra LLM Inference Provider#559
Sumit Kumar (Sumitk99) wants to merge 1 commit intolangchain-ai:mainfrom
Sumitk99:fix/deepinfra-streaming-buffer-bug

Sumit Kumar (Sumitk99) commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sumit Kumar (Sumitk99) commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant