Skip to content

Latest commit

 

History

History
128 lines (92 loc) · 5.48 KB

File metadata and controls

128 lines (92 loc) · 5.48 KB

Environment Design

Policy Overview

  • Execution modes: "GHA: Native" and "Local: Docker (default)/Native (optional)".
  • Emphasis on reproducibility and safety: When running in Docker, pass environment variables explicitly and mount test files as read-only.
  • Container name unified as TS_BENCH_CONTAINER = "ts-bench-container".

Execution Modes

  • GHA (Native)

    • Default for use_docker in .github/workflows/benchmark.yml is false.
    • Agent CLIs are installed on the runner and executed directly.
  • Local (Docker, default)

    • Uses the included Dockerfile to eliminate environment differences.
    • Run with CLI option --docker.
  • Local (Native, for debugging)

    • Install agent CLIs on the host and run without --docker.

Container Design

  • Container name: ts-bench-container (TS_BENCH_CONTAINER)

  • Base image: oven/bun:1.2.22-slim

  • Included components:

    • Git/Curl/NPM/Unzip/bzip2
    • Node.js 20 (via NodeSource) + npm 10
    • corepack (Enable corepack@0.29.4 compatible with Node 20)
    • PATH additions: /root/.local/bin
    • Agent bootstrap script: /app/scripts/run-agent.sh
  • Agent CLIs are installed on-demand by run-agent.sh. They are not baked into the image to keep layers small。

    • scripts/run-agent.sh installs agent CLIs on demand (supports aider, goose, cursor, and Node-based CLIs), always installing to /root/.local. The host-side cache directory TS_BENCH_CLI_CACHE (default: ~/.cache/ts-bench/cli) is mounted for persistence across runs.
    • scripts/smoke-agents.sh verifies each agent's CLI with --version inside Docker, using the same cache mount to avoid redundant installations.
    • Base args: docker run --rm -i
    • Workspace: Mount host exercise directory to /workspace and set as working directory.
    • Test files: Mount individually as read-only (-v host:container:ro).
    • Environment variables: Only explicitly set keys are passed with -e KEY=VALUE (no implicit passthrough).
    • Implementation reference: src/execution/docker-strategy.ts / src/utils/docker.ts
  • Local Execution

    • Change to each exercise directory before running (for simple path resolution).
    • Implementation reference: src/runners/test.ts / src/runners/test-only.ts

Testing and Package Management

  • Common test command: corepack yarn && corepack yarn test
  • Exercism exercises assume Yarn v4 (e.g., packageManager: yarn@4.5.1).
  • In the container, corepack@0.29.4 is enabled (compatible with Node 20).
  • Each agent requires the appropriate API key for its provider; if a required key is missing (e.g., OPENAI_API_KEY for OpenAI agents), execution will immediately fail with an error.

GitHub Actions (Native Execution)

  • Workflow: .github/workflows/benchmark.yml
  • Default for use_docker is false (native). Add --docker only when specified.
  • Agent CLIs installed on runner ("Install agent CLI (local mode)" step).
  • Secrets: Pass ANTHROPIC_API_KEY / OPENAI_API_KEY / GROQ_API_KEY etc. via env.
  • Example command: bun src/index.ts --agent <agent> --model <model> [--docker] ...

Main CLI Options

  • --agent <agent>: Agent to use (claude/goose/aider/codex/gemini/opencode/qwen/cursor)
  • --model <model>: Model to use
  • --provider <provider>: openai/anthropic/google/openrouter/dashscope/xai/deepseek
  • --docker: Switch to Docker execution
  • --exercise <name|N|a,b,c>: Specify exercise (name / first N / multiple)
  • --exercism-path <path>: Exercism root (default: exercism-typescript)
  • --test-only / --print-instructions: Test only / show instructions
  • --save-result --result-dir <dir>: Save results
  • --timeout <sec>: Timeout per exercise (default: 300)

Directories and I/O

  • Exercise root: exercism-typescript (EXERCISM_PRACTICE_PATH)
  • Exercise path: exercises/practice/<exercise>
  • Output (example): Use --save-result --result-dir ./results to export JSON

Security / Reproducibility

  • Docker uses --rm to discard containers after each run (no state left).
  • Test files are mounted read-only (prevents unintended modification during testing).
  • Environment variables are only passed explicitly with -e KEY=VALUE (no passthrough for unset keys).
  • corepack/Yarn versions are fixed to improve reproducibility of dependency resolution.

Local Usage

  • Docker execution (default)

    1. Build runtime image: docker build -t ts-bench-container .
    2. Run: bun src/index.ts --agent aider --model gpt-4o --docker
      • The first invocation for each agent installs the corresponding CLI inside the ephemeral container via run-agent.sh.
  • Native execution (debug)

    • Install agent CLIs on host (see GHA install steps)
    • Run: bun src/index.ts --agent aider --model gpt-4o

Troubleshooting

  • corepack not found: npm i -g corepack@0.29.4 && corepack enable
  • Yarn workspace warnings: Run in each exercise directory (handled by design for both Docker/local).
  • Agent CLI not found: When using Docker, confirm /app/scripts/run-agent.sh supports the agent or install the CLI manually on the host when running without Docker.

Customization Guidelines

  • Change container name: src/config/constants.ts
  • Add agent CLIs: Add install steps to Dockerfile
  • Add environment variables: Only pass those with values (Docker arg -e KEY=VALUE); specify as needed
  • Update Node/corepack: Update base image/version and check compatibility
  • Extend on-demand installation: Update scripts/run-agent.sh to support additional agents or custom installers.