Skip to content

Add configurable LLM timeout via environment variables#1281

Open
tomaszzmuda wants to merge 2 commits intokhoj-ai:masterfrom
tomaszzmuda:timeouts
Open

Add configurable LLM timeout via environment variables#1281
tomaszzmuda wants to merge 2 commits intokhoj-ai:masterfrom
tomaszzmuda:timeouts

Conversation

@tomaszzmuda
Copy link

@tomaszzmuda tomaszzmuda commented Mar 18, 2026

Problem

The LLM timeout is currently hardcoded with an arbitrary distinction between "local" (300s) and "remote" (60s) APIs based on whether the API base URL is localhost/127.0.0.1. This causes issues in real-world deployments:

  1. Docker containers on internal networks are "local" in a network sense but not on localhost, so they get the 60s timeout
  2. Local LLM instances with large models may need longer than 300s for generation
  3. Remote APIs with high latency may exceed the 60s timeout
  4. Users cannot adapt to their specific infrastructure without code changes

Solution

Add two environment variables to configure LLM timeouts, removing the localhost distinction entirely:

Variable Default Description
KHOJ_LLM_TIMEOUT_READ 60 Read timeout for all LLM API calls
KHOJ_LLM_TIMEOUT_CONNECT 30 Connection timeout for all LLM API calls

Implementation

Added a get_llm_timeout() helper function in src/khoj/processor/conversation/openai/utils.py that reads these environment variables and returns an httpx.Timeout configuration. All 5 timeout usages in the file now call this helper.

Backward Compatibility

  • ✅ Default 60s read timeout matches previous "remote" behavior
  • ✅ No breaking changes to API or function signatures
  • ✅ Users who relied on 300s "local" timeout can set KHOJ_LLM_TIMEOUT_READ=300

---EDIT---
After timeout happen application tried to inform frontend but if websocket was closed it hung up so I put a small fix for that

Comment on lines -1711 to 1716
await websocket.send_text(chunks)
await websocket.send_text(ChatEvent.END_EVENT.value)
try:
await websocket.send_text(chunks)
await websocket.send_text(ChatEvent.END_EVENT.value)
except RuntimeError:
pass # WebSocket already closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this try/catch in delayed_flush func really required? What happens without it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My app froze and started being irresponsible. I've been testing that fix whole day and in that fixes that. Frontend wans't work nor GET on / so my Kubernetes kills the app. It was like really bad first impress of the app because I'm using rather slow offline model.

httpx.Timeout configured with appropriate values
"""
connect_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_CONNECT", "30"))
read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", "60"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still set a longer default llm read timeout when using local ai models?

Suggested change
read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", "60"))
default_read_timeout = 300 if is_local_api(api_base_url) else 60
read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", default_read_timeout))

Note: The function description comment above and the setup.mdx updates will need to be updated to reflect the updated defaults (i.e read timeout will be 300 for local ai, 60 otherwise)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO having method like "is_local_api" is not a good approach. It's really hard to define what is "local" and what's not. In my case I run LLM in separate docker container and that code check only localhost or 127.0.0.1. On other end I have some models on Azure AI Foundry which really small rate limits and it constanly produces responses after that 60 second.
That should be defined here as global variable (and I know that is kind of breaking change) but would be much more convienent for new users. Possible next step would be to add some timeout on UI during defining model.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it’s up to you, and I’ll be fine whether the timeout is set to 300 or not. It’s just really confusing that you have to check in the code whether it’s a specific domain for the application to behave differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants