Skip to content

Perf Testing & Updates#38

Draft
dbreunig wants to merge 8 commits intomainfrom
perf
Draft

Perf Testing & Updates#38
dbreunig wants to merge 8 commits intomainfrom
perf

Conversation

@dbreunig
Copy link
Contributor

No description provided.

dbreunig and others added 8 commits February 21, 2026 09:02
Sync DSPy modules (those without aforward) were calling instance()
directly inside the async execute_pipeline, blocking the event loop
for the entire LLM round-trip. This starved async requests and
serialized all concurrent calls through a single thread.

Wrap the sync call in asyncio.to_thread() so it runs in a worker
thread while the event loop stays free.

Benchmark at 500ms mock LLM delay, 100 concurrent users:
- Async RPS: 0.32 -> 29.5 (+9,122%)
- Aggregate RPS: 2.25 -> 59.0 (+2,520%)
- P95 latency: 39,000ms -> 1,800ms (-95%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bounded thread pool executor with configurable --sync-workers flag
  (default matches Python's min(32, cpu+4) so baseline perf is preserved)
- Detect native aforward() at discovery time instead of per-request hasattr
- Batch dispatch (instance.batch) runs through executor to avoid blocking event loop
- Queue-based JSONL log writer eliminates write contention under concurrency
- Per-program semaphore backpressure with 30s queue timeout before 429
- Health check differentiation: /health/live and /health/ready for K8s probes
- All 6 scaffold templates include aforward() using acall() for async-by-default
- Config: server.sync_worker_threads, server.max_concurrent_per_program

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scaffolded aforward silently diverges from forward when developers customize
their modules. Remove it from all 6 templates so forward is the only code
path by default.

Add async modules section to generated README explaining when and why to
add aforward, the requirement to keep both methods in sync, and that
forward alone is sufficient for most workloads.
…ion tests

loop.run_in_executor does not propagate contextvars, unlike asyncio.to_thread.
This meant dspy.context(lm=request_lm) overrides were invisible to sync modules
running in the thread pool — they silently used the global LM instead of the
per-request copy. Fixed by copying context before dispatch, mirroring what
asyncio.to_thread does internally.

Added unit tests (test_executor.py) that verify contextvar propagation through
the executor for single requests, concurrent requests with different values,
and dspy.context(lm=...) specifically.

Updated load test infrastructure to catch model routing bugs:
- Mock LLM server echoes requested model name in the answer
- Fixture project uses two distinct models with per-program overrides
- Locustfile validates each response contains the expected model name
…pressure

- stress_log_integrity: fires concurrent requests, validates every JSONL log
  line is valid JSON with correct fields and per-program counts match
- stress_error_storm: tests server resilience when LLM backend returns errors,
  validates clean error responses, no crashes, no hung requests
- stress_backpressure: sends burst exceeding semaphore limit, validates
  requests queue properly (no instant 429s) and server recovers
- run_stress_tests.sh: harness that orchestrates all 3 tests with appropriate
  server configs (normal, 90% error rate, semaphore=3)
- mock_lm_server: add MOCK_ERROR_RATE env var for simulating LLM failures
Reference skill covering CLI commands, configuration, module discovery,
server endpoints, gateways, concurrency model, testing, and deployment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant