Add configurable LLM timeout via environment variables#1281
Add configurable LLM timeout via environment variables#1281tomaszzmuda wants to merge 2 commits intokhoj-ai:masterfrom
Conversation
| await websocket.send_text(chunks) | ||
| await websocket.send_text(ChatEvent.END_EVENT.value) | ||
| try: | ||
| await websocket.send_text(chunks) | ||
| await websocket.send_text(ChatEvent.END_EVENT.value) | ||
| except RuntimeError: | ||
| pass # WebSocket already closed | ||
|
|
There was a problem hiding this comment.
Is this try/catch in delayed_flush func really required? What happens without it?
There was a problem hiding this comment.
My app froze and started being irresponsible. I've been testing that fix whole day and in that fixes that. Frontend wans't work nor GET on / so my Kubernetes kills the app. It was like really bad first impress of the app because I'm using rather slow offline model.
| httpx.Timeout configured with appropriate values | ||
| """ | ||
| connect_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_CONNECT", "30")) | ||
| read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", "60")) |
There was a problem hiding this comment.
Should we still set a longer default llm read timeout when using local ai models?
| read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", "60")) | |
| default_read_timeout = 300 if is_local_api(api_base_url) else 60 | |
| read_timeout = float(os.getenv("KHOJ_LLM_TIMEOUT_READ", default_read_timeout)) |
Note: The function description comment above and the setup.mdx updates will need to be updated to reflect the updated defaults (i.e read timeout will be 300 for local ai, 60 otherwise)
There was a problem hiding this comment.
IMO having method like "is_local_api" is not a good approach. It's really hard to define what is "local" and what's not. In my case I run LLM in separate docker container and that code check only localhost or 127.0.0.1. On other end I have some models on Azure AI Foundry which really small rate limits and it constanly produces responses after that 60 second.
That should be defined here as global variable (and I know that is kind of breaking change) but would be much more convienent for new users. Possible next step would be to add some timeout on UI during defining model.
There was a problem hiding this comment.
Of course it’s up to you, and I’ll be fine whether the timeout is set to 300 or not. It’s just really confusing that you have to check in the code whether it’s a specific domain for the application to behave differently.
Problem
The LLM timeout is currently hardcoded with an arbitrary distinction between "local" (300s) and "remote" (60s) APIs based on whether the API base URL is localhost/127.0.0.1. This causes issues in real-world deployments:
Solution
Add two environment variables to configure LLM timeouts, removing the localhost distinction entirely:
KHOJ_LLM_TIMEOUT_READ60KHOJ_LLM_TIMEOUT_CONNECT30Implementation
Added a
get_llm_timeout()helper function insrc/khoj/processor/conversation/openai/utils.pythat reads these environment variables and returns anhttpx.Timeoutconfiguration. All 5 timeout usages in the file now call this helper.Backward Compatibility
KHOJ_LLM_TIMEOUT_READ=300---EDIT---
After timeout happen application tried to inform frontend but if websocket was closed it hung up so I put a small fix for that