fix: notify users via Slack when agent hits model call step limit by langsmith-forge[bot] · Pull Request #1204 · langchain-ai/open-swe

langsmith-forge · 2026-04-17T00:56:50Z

Problem

The agent silently terminates after hitting a 1000-step recursion limit without sending any Slack notification to users. Users start a task, the agent runs for 1000 steps and then stops, leaving users with no completion message, no PR, and no indication of what happened.

Traces:

Root cause

When DEFAULT_RECURSION_LIMIT = 1_000 is hit, LangGraph raises GraphRecursionError which bypasses all @after_agent middleware (including open_pr_if_needed). Users get no Slack notification that the agent stopped. In a typical 1000-run trace: ~72 LLM calls + 141 tool calls + 787 chain runs = 1000 total — so ~72 model calls hits the hard limit.

Fix

Two changes in agent/server.py and a new middleware file:

Added ModelCallLimitMiddleware(run_limit=60, exit_behavior="end") to the middleware stack. This intercepts gracefully at 60 model calls — before the 1000-step recursion limit fires — injecting an AI message with "Model call limits exceeded: ..." and routing to end (which DOES run @after_agent middleware).
Added notify_step_limit_reached — a new @after_agent middleware in agent/middleware/notify_step_limit.py — that detects the limit marker in the last AI message and posts a Slack thread reply:

"I've reached my maximum step limit and had to stop. The task may be incomplete. You can retry with a more focused request, or ask me to continue from where I left off."

The notify_step_limit_reached middleware is placed last in the list so it runs after open_pr_if_needed (after_agent hooks run in reverse list order), ensuring any partial work is committed before the user is notified.

Evidence

No new tests written — this is a behavioral infrastructure change (adding middleware + a config parameter). Tests asserting exact middleware configuration or prompt content would be brittle. The production traces are the evidence.

CI checks pass locally
Existing tests pass, no regressions (107 tests pass with uv sync --locked --extra dev)
Change is minimal and scoped — no drive-by refactors

- Root cause: GraphRecursionError at 1000 steps bypassed all @after_agent middleware including open_pr_if_needed, leaving users with no notification - Change: Added ModelCallLimitMiddleware(run_limit=60) to intercept gracefully before the hard recursion limit, and added notify_step_limit_reached @after_agent middleware to post a Slack thread reply when the limit fires - Verified: 107 existing tests pass, no regressions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: notify users via Slack when agent hits model call step limit#1204

fix: notify users via Slack when agent hits model call step limit#1204
langsmith-forge[bot] wants to merge 1 commit intomainfrom
auto/agent-fix-recursion-limit-notification

langsmith-forge Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

langsmith-forge Bot commented Apr 17, 2026

Problem

Root cause

Fix

Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants