Skip to content

Outcome 5: parallel/interleaved replication execution design#177

Open
DianaMeda wants to merge 1 commit into
jumbocontext:mainfrom
DianaMeda:outcome-5-parallel-design
Open

Outcome 5: parallel/interleaved replication execution design#177
DianaMeda wants to merge 1 commit into
jumbocontext:mainfrom
DianaMeda:outcome-5-parallel-design

Conversation

@DianaMeda

Copy link
Copy Markdown
Contributor

What

The design for the parallel-execution challenge (GOAL.md Challenges) — a docs-only deliverable, no implementation. Branches off main, independent of the open Outcome 4/5 code PRs.

docs/parallel-replication-design.md covers:

  • Bounded worker-pool model with a --concurrency N knob and a pure, order-independent runWithConcurrency primitive.
  • Key finding: parallelism does not bias the measurement — all scored dimensions are artefact/count-based (not wall-clock), and aggregateReplications is order-independent. So concurrency risks are operational (rate limits, OOM, failures), not statistical.
  • Resource-contention analysis for non-API agent harnesses; provider rate limits are the binding constraint → recommended default N = 1, opt-in higher, with a method to pick N for a host.
  • Arm interleaving go/no-go: GO (per-session jumbo/baseline alternation) — a contained ab-runner refactor, orthogonal to the worker pool.
  • Decomposition into follow-on implementation goals, and resolution of GOAL.md's two open questions.

Decisions

Registered in the evals Jumbo project (dogfooding): default replication concurrency = 1 (opt-in higher); arm interleaving = GO (independent goal).

Scope

Design only — the implementation goals it decomposes are follow-ons.

🤖 Generated with Claude Code

Design deliverable for the parallel-execution challenge (Jumbo goal fa0394d4):
a design doc, not implementation.

- docs/parallel-replication-design.md: bounded worker-pool model with a
  --concurrency N knob and an order-independent runWithConcurrency primitive;
  resource-contention analysis for non-API agent harnesses (rate limits as the
  binding constraint) with a recommended default N=1; arm-interleaving go/no-go
  (GO, per-session alternation); decomposition into follow-on implementation
  goals; and resolution of GOAL.md's two open questions.

Key finding: parallelism does not bias the measurement (all scored dimensions
are artefact/count-based, not wall-clock; aggregateReplications is
order-independent), so concurrency risks are operational, not statistical.

Two decisions registered in Jumbo (default concurrency = 1; arm interleaving = GO).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant