- Execution modes: "GHA: Native" and "Local: Docker (default)/Native (optional)".
- Emphasis on reproducibility and safety: When running in Docker, pass environment variables explicitly and mount test files as read-only.
- Container name unified as
TS_BENCH_CONTAINER = "ts-bench-container".
-
GHA (Native)
- Default for
use_dockerin.github/workflows/benchmark.ymlisfalse. - Agent CLIs are installed on the runner and executed directly.
- Default for
-
Local (Docker, default)
- Uses the included
Dockerfileto eliminate environment differences. - Run with CLI option
--docker.
- Uses the included
-
Local (Native, for debugging)
- Install agent CLIs on the host and run without
--docker.
- Install agent CLIs on the host and run without
-
Container name:
ts-bench-container(TS_BENCH_CONTAINER) -
Base image:
oven/bun:1.2.22-slim -
Included components:
- Git/Curl/NPM/Unzip/bzip2
- Node.js 20 (via NodeSource) + npm 10
- corepack (Enable
corepack@0.29.4compatible with Node 20) - PATH additions:
/root/.local/bin - Agent bootstrap script:
/app/scripts/run-agent.sh
-
Agent CLIs are installed on-demand by
run-agent.sh. They are not baked into the image to keep layers small。scripts/run-agent.shinstalls agent CLIs on demand (supports aider, goose, cursor, and Node-based CLIs), always installing to/root/.local. The host-side cache directoryTS_BENCH_CLI_CACHE(default:~/.cache/ts-bench/cli) is mounted for persistence across runs.scripts/smoke-agents.shverifies each agent's CLI with--versioninside Docker, using the same cache mount to avoid redundant installations.- Base args:
docker run --rm -i - Workspace: Mount host exercise directory to
/workspaceand set as working directory. - Test files: Mount individually as read-only (
-v host:container:ro). - Environment variables: Only explicitly set keys are passed with
-e KEY=VALUE(no implicit passthrough). - Implementation reference:
src/execution/docker-strategy.ts/src/utils/docker.ts
-
Local Execution
- Change to each exercise directory before running (for simple path resolution).
- Implementation reference:
src/runners/test.ts/src/runners/test-only.ts
- Common test command:
corepack yarn && corepack yarn test - Exercism exercises assume Yarn v4 (e.g.,
packageManager: yarn@4.5.1). - In the container,
corepack@0.29.4is enabled (compatible with Node 20). - Each agent requires the appropriate API key for its provider; if a required key is missing (e.g.,
OPENAI_API_KEYfor OpenAI agents), execution will immediately fail with an error.
- Workflow:
.github/workflows/benchmark.yml - Default for
use_dockerisfalse(native). Add--dockeronly when specified. - Agent CLIs installed on runner ("Install agent CLI (local mode)" step).
- Secrets: Pass
ANTHROPIC_API_KEY/OPENAI_API_KEY/GROQ_API_KEYetc. viaenv. - Example command:
bun src/index.ts --agent <agent> --model <model> [--docker] ...
--agent <agent>: Agent to use (claude/goose/aider/codex/gemini/opencode/qwen/cursor)--model <model>: Model to use--provider <provider>: openai/anthropic/google/openrouter/dashscope/xai/deepseek--docker: Switch to Docker execution--exercise <name|N|a,b,c>: Specify exercise (name / first N / multiple)--exercism-path <path>: Exercism root (default:exercism-typescript)--test-only/--print-instructions: Test only / show instructions--save-result --result-dir <dir>: Save results--timeout <sec>: Timeout per exercise (default: 300)
- Exercise root:
exercism-typescript(EXERCISM_PRACTICE_PATH) - Exercise path:
exercises/practice/<exercise> - Output (example): Use
--save-result --result-dir ./resultsto export JSON
- Docker uses
--rmto discard containers after each run (no state left). - Test files are mounted read-only (prevents unintended modification during testing).
- Environment variables are only passed explicitly with
-e KEY=VALUE(no passthrough for unset keys). - corepack/Yarn versions are fixed to improve reproducibility of dependency resolution.
-
Docker execution (default)
- Build runtime image:
docker build -t ts-bench-container . - Run:
bun src/index.ts --agent aider --model gpt-4o --docker- The first invocation for each agent installs the corresponding CLI inside the ephemeral container via
run-agent.sh.
- The first invocation for each agent installs the corresponding CLI inside the ephemeral container via
- Build runtime image:
-
Native execution (debug)
- Install agent CLIs on host (see GHA install steps)
- Run:
bun src/index.ts --agent aider --model gpt-4o
- corepack not found:
npm i -g corepack@0.29.4 && corepack enable - Yarn workspace warnings: Run in each exercise directory (handled by design for both Docker/local).
- Agent CLI not found: When using Docker, confirm
/app/scripts/run-agent.shsupports the agent or install the CLI manually on the host when running without Docker.
- Change container name:
src/config/constants.ts - Add agent CLIs: Add install steps to
Dockerfile - Add environment variables: Only pass those with values (Docker arg
-e KEY=VALUE); specify as needed - Update Node/corepack: Update base image/version and check compatibility
- Extend on-demand installation: Update
scripts/run-agent.shto support additional agents or custom installers.