Conversation
Sync DSPy modules (those without aforward) were calling instance() directly inside the async execute_pipeline, blocking the event loop for the entire LLM round-trip. This starved async requests and serialized all concurrent calls through a single thread. Wrap the sync call in asyncio.to_thread() so it runs in a worker thread while the event loop stays free. Benchmark at 500ms mock LLM delay, 100 concurrent users: - Async RPS: 0.32 -> 29.5 (+9,122%) - Aggregate RPS: 2.25 -> 59.0 (+2,520%) - P95 latency: 39,000ms -> 1,800ms (-95%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bounded thread pool executor with configurable --sync-workers flag (default matches Python's min(32, cpu+4) so baseline perf is preserved) - Detect native aforward() at discovery time instead of per-request hasattr - Batch dispatch (instance.batch) runs through executor to avoid blocking event loop - Queue-based JSONL log writer eliminates write contention under concurrency - Per-program semaphore backpressure with 30s queue timeout before 429 - Health check differentiation: /health/live and /health/ready for K8s probes - All 6 scaffold templates include aforward() using acall() for async-by-default - Config: server.sync_worker_threads, server.max_concurrent_per_program Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scaffolded aforward silently diverges from forward when developers customize their modules. Remove it from all 6 templates so forward is the only code path by default. Add async modules section to generated README explaining when and why to add aforward, the requirement to keep both methods in sync, and that forward alone is sufficient for most workloads.
…ion tests loop.run_in_executor does not propagate contextvars, unlike asyncio.to_thread. This meant dspy.context(lm=request_lm) overrides were invisible to sync modules running in the thread pool — they silently used the global LM instead of the per-request copy. Fixed by copying context before dispatch, mirroring what asyncio.to_thread does internally. Added unit tests (test_executor.py) that verify contextvar propagation through the executor for single requests, concurrent requests with different values, and dspy.context(lm=...) specifically. Updated load test infrastructure to catch model routing bugs: - Mock LLM server echoes requested model name in the answer - Fixture project uses two distinct models with per-program overrides - Locustfile validates each response contains the expected model name
…pressure - stress_log_integrity: fires concurrent requests, validates every JSONL log line is valid JSON with correct fields and per-program counts match - stress_error_storm: tests server resilience when LLM backend returns errors, validates clean error responses, no crashes, no hung requests - stress_backpressure: sends burst exceeding semaphore limit, validates requests queue properly (no instant 429s) and server recovers - run_stress_tests.sh: harness that orchestrates all 3 tests with appropriate server configs (normal, 90% error rate, semaphore=3) - mock_lm_server: add MOCK_ERROR_RATE env var for simulating LLM failures
Reference skill covering CLI commands, configuration, module discovery, server endpoints, gateways, concurrency model, testing, and deployment.
This was referenced Feb 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.