blacksmith

Agentic development orchestrator — a LangGraph state machine that drives a PRD's work units, in dependency order, each through plan → implement → test-gate → review → PR, with durable checkpointed state and human approval gates. The Claude Agent SDK is the per-node execution engine; blacksmith operates on a target repository via isolated git clones, the repo's own test/lint toolchain, and gh.

The deliverable is LangGraph fluency — a working, inspectable grasp of its state-machine / checkpoint / human-in-the-loop model. blacksmith now drives most of its own development: it builds its features from Contract v1 PRDs, on itself. See blacksmith-v0-prd.md for the original spec. Writing your own PRD? The PRD authoring guide documents the contract every PRD must conform to.

Status: well past the v0 spine — blacksmith builds its own features end to end, and nearly everything below was shipped that way (write a PRD → blacksmith builds it → review the PR). Current on main: multi-unit execution of a PRD's whole depends_on DAG (topological order, independent units in parallel within a level, one combined PR); clone-based isolation (each run/unit works in a throwaway git clone, never the real checkout); a human-QA path (a human-gated unit opens a draft PR and ends AWAITING_QA, branch preserved); tiered models (implement on Sonnet first, escalate to Opus only on a gate failure); long-term memory (per-repo gate-failure lessons fed back to the planner via a SQLite Store); a local observability suite — per-run/per-unit metrics recording, a blacksmith runs history command, a built-in blacksmith dashboard, and per-call agent transcripts; a rendered interactive CLI (Markdown plans, diffs, live progress) that degrades to plain output when piped; fresh-run state isolation; per-run token/cache instrumentation; and an org cost reporter. 300+ tests, CI-green. Proven against its own repo and an external Node/TypeScript target.

Stack

Python 3.12 · LangGraph · Claude Agent SDK · SQLite checkpointer · managed with uv

Setup

uv sync                  # provisions Python 3.12 + installs deps
cp .env.example .env     # add your dedicated BLACKSMITH_ANTHROPIC_API_KEY
uv run pytest            # run the test suite
uv run ruff check        # lint

Global install

Install blacksmith as a user-wide tool so it's on your PATH and runnable from inside any repo:

uv tool install .                                          # from a local clone
uv tool install git+https://github.com/smith-and-web/blacksmith   # from the git URL

Once installed, cd into the target repo and run blacksmith <prd> directly (no uv run prefix). blacksmith discovers its blacksmith.config.toml by walking up from the current directory to the git root, and — when [target].repo_path is omitted — operates on that repo (the git root of where you're standing). See Running.

Configuration

blacksmith.config.toml — blacksmith's own runtime config: model tiering, target repo, checkpointer, the long-term memory [store], the run-metrics [metrics] sink, agent [transcripts], and API auth. Parsed by blacksmith/config.py. It's discovered by walking up from the current directory to the git root, so a globally-installed blacksmith works from any nested path inside the repo.
A separate blacksmith.toml lives in each target repo and defines that repo's toolchain — test_cmd, optional lint_cmd, and optional setup_cmd (a one-off provisioning step like npm ci, run before the gate) — read by the test gate.

The dedicated Anthropic API key is read from the env var named in [api].key_env_var (default BLACKSMITH_ANTHROPIC_API_KEY) — never from subscription auth, keeping blacksmith's metered spend isolated.

Running

# from inside the target repo, with blacksmith installed globally:
blacksmith <path/to/prd.md>                 # run a PRD's work units (the whole DAG)
blacksmith <prd> --approve plan,pr          # non-interactive (CI / headless)

# from a clone of blacksmith itself, without a global install:
uv run blacksmith <path/to/prd.md>

Run-inside-the-repo flow. When [target].repo_path is omitted from blacksmith.config.toml (or the [target] section is dropped entirely), blacksmith operates on the git root of the directory you run it from. So you can cd into any repo, commit a blacksmith.config.toml (no repo_path needed) plus a blacksmith.toml, and run blacksmith <prd> from anywhere inside it. Setting an explicit absolute [target].repo_path still works unchanged and takes precedence.

The PRD path is the single positional argument. --config points at a non-default blacksmith.config.toml (otherwise it's discovered by walking up to the git root); --thread-id names the checkpointer thread; a fresh run resets a terminally-finished thread and refuses a paused one, so a reused id won't resurrect prior state — a fresh id per run is still the simplest habit. Interactive runs pause for a y/n at the plan and PR gates; --auto-approve approves both, and --approve plan,pr approves only the gates you name (an unlisted gate is denied, halting the run there). --quiet silences the per-node progress stream.

Other entry points:

blacksmith validate <prd>            # offline contract check — field-level errors, zero model spend
blacksmith resume --thread-id <id>   # continue an interrupted run from its SQLite checkpoint
blacksmith runs [<thread-id>]        # recorded run history; pass a thread-id to drill into one run
blacksmith dashboard                 # local read-only web dashboard over recorded run metrics (localhost)
blacksmith --issue <N>               # scaffold a PRD skeleton from GitHub issue #N (PR links Closes #N)
blacksmith costs                     # org usage + cost from the Admin API (read-only; needs an admin key)

Onboarding a new target repo

blacksmith operates on a target repo via git clones — nothing about blacksmith is compiled into it, and the repo needs no prior setup beyond being a git clone.

blacksmith only sees committed state. Each run cuts a fresh clone from HEAD, so anything it should use must be committed, not just staged — including blacksmith.toml and CLAUDE.md (a staged-but-uncommitted config reads as "no blacksmith.toml found"). That clone also starts with no installed dependencies (node_modules, virtualenvs, etc. are gitignored) — which is what setup_cmd is for.

To point blacksmith at a new project:

Point blacksmith at the clone. Either set [target] repo_path in blacksmith.config.toml to the local path, or omit repo_path and run blacksmith from inside the repo — it then targets the git root of your current directory (see Running). Set default_branch if it isn't main.
Add a blacksmith.toml to the target repo so the test gate knows its toolchain. Because the gate runs in a fresh clone with no installed deps, a Node/TS target needs setup_cmd to install them — test_cmd = "npm test" alone dies with vitest: command not found:
```
# committed in the TARGET repo (e.g. a Node/TypeScript MCP server)
setup_cmd = "npm ci"           # optional; one-off provisioning, runs before the gate
test_cmd = "npm test"
lint_cmd = "npm run lint"      # optional; runs only if tests pass
```
Commands run through a shell, so chains work directly (test_cmd = "npm ci && npm test") with no sh -c wrapper. cargo needs no setup_cmd — it fetches its own deps.
Give it context — commit a CLAUDE.md. For a repo with no claude.ai Project, a root-level CLAUDE.md is how its conventions reach the agent: blacksmith reads the clone's CLAUDE.md and injects it into the implementer's system prompt as project context. For safety it does not load the repo's .claude/settings.json — permissions and hooks are never inherited from a target — and the PRD untouchables always override the repo's own guidance.
Write a Contract v1 PRD for the work — see docs/prd-authoring-guide.md. Set primary_target_repo, declare layers, list untouchables, and define work_units. blacksmith runs the whole work_units DAG in dependency order on one shared branch and opens a single combined PR — use depends_on to order them (independent units at the same level run in parallel). Use an auto layer where the test gate should decide pass/fail end to end; a human layer instead opens a draft PR for manual QA.
Have the claude CLI on PATH — the Agent SDK spawns it for live runs.
Run it — uv run blacksmith path/to/your-prd.md — then approve at the plan and PR gates (or run headless with --approve; see Running).

Worked example — fixing an MCP spec violation. Given an Obsidian MCP server whose tool violates the MCP spec: add blacksmith.toml (npm ci setup + npm test / npm run lint), commit a CLAUDE.md capturing the server's conventions, and write a one-unit PRD whose work unit targets the offending tool's module with a test_contract that asserts the spec-conformant shape. blacksmith plans, implements in an isolated clone, gates on npm test, and opens a PR for review — no claude.ai Project required.

Build progress (WU-01…WU-11 — see PRD §6)

Beyond the v0 spine — built by blacksmith on itself

Multi-unit DAG execution — topological ordering + parallel fan-out within a dependency level, accumulating onto one combined PR.
Clone-based isolation — a throwaway git clone per run/unit; the real checkout is never touched.
Human-QA path — a human-gated unit opens a draft PR and ends AWAITING_QA, branch preserved for manual review.
Tiered models — implement on Sonnet first, escalate to Opus only on a gate failure.
More CLIs — validate (offline contract check), resume (continue from a checkpoint), runs (recorded run history + per-run drill-down), dashboard (local metrics UI), --issue N (scaffold from a GitHub issue), costs (org Admin-API usage/cost), and global uv tool install.
Long-term memory — per-repo gate-failure lessons persisted in a SQLite Store and fed back into the planner's context on later runs.
Observability — per-run/per-unit metrics recorded to a local SQLite, a blacksmith runs history command, a built-in localhost blacksmith dashboard, and per-call agent transcripts linked from each run.
Interactive CLI — rendered Markdown plans, diffs, and test output with a live progress indicator at the gates, degrading to plain, parseable output when piped or under --quiet.
Fresh-run state isolation — a fresh run resets a terminally-finished thread and refuses a paused one, so a reused --thread-id can't resurrect a prior run's errors.
Cost visibility — end-of-run cost total (summed across all units + escalations) plus per-run token + cache-hit instrumentation.
Robustness — repo-consistency preflight, reason-accurate guard-block reporting, graceful executor failures, and a forward-migrating metrics store.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
blacksmith		blacksmith
docs		docs
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
blacksmith-v0-prd.md		blacksmith-v0-prd.md
blacksmith.config.toml		blacksmith.config.toml
blacksmith.toml		blacksmith.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blacksmith

Stack

Setup

Global install

Configuration

Running

Onboarding a new target repo

Build progress (WU-01…WU-11 — see PRD §6)

Beyond the v0 spine — built by blacksmith on itself

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

blacksmith

Stack

Setup

Global install

Configuration

Running

Onboarding a new target repo

Build progress (WU-01…WU-11 — see PRD §6)

Beyond the v0 spine — built by blacksmith on itself

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages