feat(resilience): API robustness improvements with UI settings sync#208
Open
Kaguya-19 wants to merge 4 commits into
Open
feat(resilience): API robustness improvements with UI settings sync#208Kaguya-19 wants to merge 4 commits into
Kaguya-19 wants to merge 4 commits into
Conversation
…hints - Extend CanonicalModelErrorCode with billing, model_not_found, context_overflow, image_too_large, payload_too_large - Add userHint + settingsFix fields to CanonicalModelError for actionable user-facing guidance on every classified error - Expand error pattern matching (20+ patterns) covering Ollama, llama.cpp, vLLM, Bedrock, Chinese error messages, etc. - Add 402 disambiguation: billing exhaustion vs transient rate limit - Add sanitizeErrorMessage: extract <title> from HTML error pages, normalize whitespace, truncate overly long messages - Propagate userHint through AgentError and classifyModelError - Make billing/model_not_found/auth_error fallback-eligible - Teach ContextOverflowRecovery about context_overflow + image_too_large Co-authored-by: Cursor <cursoragent@cursor.com>
1. classifySemanticError: move RATE_LIMIT and BILLING patterns before CONTEXT_OVERFLOW to prevent "input tokens per minute" being misclassified as context overflow. 2. statusCodeToCode 402: check BILLING_PATTERN first so explicit billing exhaustion messages are never mistaken for transient rate limits (avoids futile retries). 3. DefaultContextRuntime inline fallback: align with ContextOverflowRecovery — handle image_too_large and context_overflow codes, check recoverableViaCompact flag. Co-authored-by: Cursor <cursoragent@cursor.com>
1. 解析 Retry-After HTTP 头和错误消息中的 retry hint 2. 流式空闲超时(默认 5 分钟),防止连接假活永久挂起 3. Provider 级别可配置重试策略(provider.retry 生效) 4. Mid-stream 429 重试:利用 checkpoint 续传而非直接终止 5. 重试进度对用户可见(Reconnecting... 2/5) 6. 提供商健康状态追踪(简易熔断 healthy/degraded/open) Co-authored-by: Cursor <cursoragent@cursor.com>
- Add transientRetry panel in RouterSection Advanced area - Add per-provider retry config in ProviderCard Advanced area - Passthrough userHint from gateway → bridge → chat error render - Add retryProgress structured rendering in live status step - Add GatewayEvent userHint type field - Add i18n keys for transientRetry and provider retry (en + zh-CN) Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
userHint),SettingsFixsuggestions, and reordered pattern matching to prevent misclassification (rate_limit/billing before context_overflow)Retry-Afterheader/message parsing, stream idle timeout, circuit breaker (ProviderHealthTracker), mid-stream rate-limit recovery, andretryProgressevent broadcastingbilling,model_not_found,auth_error) now eligible for provider fallbacktransientRetrypanel and per-providerretryconfig in Advanced sections,userHinterror rendering with hint icon, structuredretryProgresslive status display, i18n keys (en + zh-CN)Commits (4)
feat(errors): user-friendly error classification with actionable hintsfix(errors): reorder pattern matching and harden edge casesfeat(resilience): remote API robustness (retry, circuit breaker, stream idle timeout)feat(ui): sync robust-api settings to frontend (settings panels, error hints, retry progress)Changed files (31 files, +969/-54)
Backend (20 files)
src/model/errors/normalizeModelError.ts— semantic error classification,sanitizeErrorMessage,resolveUserHintsrc/model/protocol/errors.ts— canonical error codes, regex patterns,parseRetryAfter*src/model/streaming/streamModel.ts— configurable retries, stream idle timeoutsrc/router/RouterRuntime.ts—ProviderHealthTrackerintegration, mid-stream continuation, retry progress eventssrc/router/health/ProviderHealthTracker.ts— circuit breaker (healthy/degraded/open/half_open)src/router/fallback/runFallbackChain.ts— fallback-eligible non-retryable codessrc/gateway/client/InProcessGateway.ts—broadcastRetryProgress,userHintpassthroughsrc/gateway/protocol/types.ts—GatewayEventuserHint fieldFrontend (11 files)
PilotDeckConfigTab.tsx— transientRetry panel + provider retry Advanced sectionMessageComponent.tsx— userHint amber hint boxMessagesPaneV2.tsx— retryProgress live status steppilotdeck-bridge.js— retry_progress + userHint passthrough{en,zh-CN}/{settings,chat}.json— translation keysTest plan
tsc --noEmit— 0 new errorsvitest run— 12 files / 65 tests passedvite build— 3372 modules compiled successfullyMade with Cursor