-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add configurable LLM timeout via environment variables #1281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1708,17 +1708,23 @@ async def delayed_flush(): | |
| await asyncio.sleep(BUFFER_FLUSH_INTERVAL) | ||
| # Check if there's still content to flush | ||
| chunks = "".join([chunk async for chunk in flush_message_buffer()]) | ||
| await websocket.send_text(chunks) | ||
| await websocket.send_text(ChatEvent.END_EVENT.value) | ||
| try: | ||
| await websocket.send_text(chunks) | ||
| await websocket.send_text(ChatEvent.END_EVENT.value) | ||
| except RuntimeError: | ||
| pass # WebSocket already closed | ||
|
|
||
|
Comment on lines
-1711
to
1716
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this try/catch in
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My app froze and started being irresponsible. I've been testing that fix whole day and in that fixes that. Frontend wans't work nor GET on / so my Kubernetes kills the app. It was like really bad first impress of the app because I'm using rather slow offline model. |
||
| # Flush buffer if no new messages arrive within debounce interval | ||
| message_buffer.timeout = asyncio.create_task(delayed_flush()) | ||
| except asyncio.CancelledError: | ||
| logger.debug(f"Chat request cancelled for user {websocket.scope['user'].object.id}") | ||
| raise | ||
| except Exception as e: | ||
| await websocket.send_text(json.dumps({"error": "Internal server error"})) | ||
| logger.error(f"Error processing chat request: {e}", exc_info=True) | ||
| try: | ||
| await websocket.send_text(json.dumps({"error": "Internal server error"})) | ||
| except RuntimeError: | ||
| pass # WebSocket already closed | ||
| raise | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still set a longer default llm read timeout when using local ai models?
Note: The function description comment above and the setup.mdx updates will need to be updated to reflect the updated defaults (i.e read timeout will be 300 for local ai, 60 otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO having method like "is_local_api" is not a good approach. It's really hard to define what is "local" and what's not. In my case I run LLM in separate docker container and that code check only localhost or 127.0.0.1. On other end I have some models on Azure AI Foundry which really small rate limits and it constanly produces responses after that 60 second.
That should be defined here as global variable (and I know that is kind of breaking change) but would be much more convienent for new users. Possible next step would be to add some timeout on UI during defining model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it’s up to you, and I’ll be fine whether the timeout is set to 300 or not. It’s just really confusing that you have to check in the code whether it’s a specific domain for the application to behave differently.