fix: notify users via Slack when agent hits model call step limit#1204
Open
langsmith-forge[bot] wants to merge 1 commit intomainfrom
Open
fix: notify users via Slack when agent hits model call step limit#1204langsmith-forge[bot] wants to merge 1 commit intomainfrom
langsmith-forge[bot] wants to merge 1 commit intomainfrom
Conversation
- Root cause: GraphRecursionError at 1000 steps bypassed all @after_agent middleware including open_pr_if_needed, leaving users with no notification - Change: Added ModelCallLimitMiddleware(run_limit=60) to intercept gracefully before the hard recursion limit, and added notify_step_limit_reached @after_agent middleware to post a Slack thread reply when the limit fires - Verified: 107 existing tests pass, no regressions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The agent silently terminates after hitting a 1000-step recursion limit without sending any Slack notification to users. Users start a task, the agent runs for 1000 steps and then stops, leaving users with no completion message, no PR, and no indication of what happened.
Traces:
Root cause
When
DEFAULT_RECURSION_LIMIT = 1_000is hit, LangGraph raisesGraphRecursionErrorwhich bypasses all@after_agentmiddleware (includingopen_pr_if_needed). Users get no Slack notification that the agent stopped. In a typical 1000-run trace: ~72 LLM calls + 141 tool calls + 787 chain runs = 1000 total — so ~72 model calls hits the hard limit.Fix
Two changes in
agent/server.pyand a new middleware file:Added
ModelCallLimitMiddleware(run_limit=60, exit_behavior="end")to the middleware stack. This intercepts gracefully at 60 model calls — before the 1000-step recursion limit fires — injecting an AI message with"Model call limits exceeded: ..."and routing toend(which DOES run@after_agentmiddleware).Added
notify_step_limit_reached— a new@after_agentmiddleware inagent/middleware/notify_step_limit.py— that detects the limit marker in the last AI message and posts a Slack thread reply:The
notify_step_limit_reachedmiddleware is placed last in the list so it runs afteropen_pr_if_needed(after_agent hooks run in reverse list order), ensuring any partial work is committed before the user is notified.Evidence
No new tests written — this is a behavioral infrastructure change (adding middleware + a config parameter). Tests asserting exact middleware configuration or prompt content would be brittle. The production traces are the evidence.
uv sync --locked --extra dev)