Invariant Biosynthesis

Cryptographic request-validation firewall for AI-controlled biological and chemical synthesis.

  COGNITIVE DOMAIN             INVARIANT FIREWALL            EXECUTION DOMAIN
 +-------------------+     +------------------------+     +--------------------+
 | RFdiffusion       | --> | Verify PCA chain       | --> | DNA synthesizer    |
 | ProteinMPNN       |     | Screen sequences (D/P) |     | Peptide synth      |
 | LLM lab planners  |     | Screen structures  (C) |     | Chemspeed reactor  |
 | Claude agents     |     | Sign approved bundle   |     | Cloud lab API      |
 | Prompt injection  |     | Reject + log denied    |     | Reagent dispenser  |
 | Hallucinations    |     | Watchdog heartbeat     |     | The real world     |
 +-------------------+     +------------------------+     +--------------------+
     UNTRUSTED                 TRUST BOUNDARY                  PROTECTED

Nothing from the cognitive domain reaches a synthesizer without Invariant's Ed25519 signature on top of a deterministic screening verdict. The AI cannot bypass it. The AI cannot modify it. The synthesizer verifies the signature before a single base, amide bond, or reagent moves.

Sibling project: invariant-robotics — the motor-actuation substrate this project extends to biology.

Bottom Line Up Front

AI is now writing DNA, designing proteins, and planning chemical syntheses. The gap between "the model hallucinated" and "the reagent was dispensed" is the same gap robotics faces between "the model hallucinated" and "the actuator moved" — and biology is more irreversible than physics. A released pathogen cannot be patched.

This repository delivers one zero-dependency-where-possible, minimal-dependency-otherwise Rust workspace that closes that gap with a deterministic, cryptographically-enforced, fail-closed firewall between AI planners and synthesis hardware.

Problems and Risks

Prompt injection in synthesis planners. An LLM ingesting a paper, a ticket, or a sensor feed can be steered into emitting a bundle that asks for a dangerous sequence.
Hallucinated confidence. Generative protein/chemistry models produce plausible-looking designs with no ground-truth link to safety consequences.
Authority scope creep. AI agents exceed the IRB/IBC-approved protocol — wrong organism, wrong volume, wrong BSL.
Replay and forgery. Signed synthesis commands get replayed, reordered, or forged by a compromised middleware layer.
Screening-database poisoning. Attacker mutates the hazard list so a select-agent sequence passes as benign.
Irreversibility. A synthesized toxin, pathogen fragment, or Schedule I precursor cannot be recalled. Software-level "oops" does not apply.
Audit tampering. Post-incident, logs get rewritten to hide which agent, which prompt, which operator.
Platform spoofing. A non-approved synthesizer (or a compromised one) accepts a command it should have refused.
Cross-institutional leakage. Authority delegated for collaboration A gets used for project B.
Dual-use drift. Individually benign requests compose into a banned capability; no single reviewer sees the whole picture.
Brain crash during run. The AI planner crashes or stalls mid-protocol and leaves hardware in an unsafe intermediate state.
Covert-channel leaks. Sequence or structure data exfiltrated via seemingly innocuous synthesis metadata.

Solutions — all in this one repo

Each problem maps to a deterministic, testable mechanism. All of them are built from a minimal dependency set (ed25519-dalek, sha2, serde, clap, thiserror, chrono, base64, regex, rand) — no database, no network stack in the core, no Python runtime, no machine-learning framework.

#	Problem	Mechanism delivered here
1	Prompt injection	Cognitive outputs are never trusted; they must be wrapped in a `SynthesisBundle` signed under a PCA-chain leaf whose ops are already narrowed below the dangerous set.
2	Hallucinated confidence	Deterministic invariant set (D1–D10 DNA, P1–P10 peptide, C1–C10 chemical) runs after the AI — model confidence is ignored, math decides.
3	Authority scope creep	PCA chain with monotonic scope narrowing: no hop can expand permissions, ever. Verified cryptographically, not by convention.
4	Replay / forgery	Ed25519 signatures + nonces + timestamp windows + sequence numbers on every bundle, verdict, and execution token.
5	DB poisoning	Screening-database updates are themselves signed payloads with hash-chained history; unsigned or stale DBs fail closed.
6	Irreversibility	Fail-closed default. The synthesizer requires a signed execution token per command — absence of signature means no action.
7	Audit tampering	Append-only JSONL audit log with per-entry Ed25519 signatures and a SHA-256 hash chain; tamper detection in O(n). Optional Merkle-root witness replication for external notarization.
8	Platform spoofing	Each synthesizer holds its own keypair; firewall binds execution tokens to a specific `kid` and the hardware refuses tokens addressed to a different device.
9	Cross-institutional leakage	Op-scope algebra (intersection / subset verification) proves every hop's ops are a strict subset of the parent's.
10	Dual-use drift	Runtime threat-scoring monitor aggregates across bundles per operator / per session; composite scores cross thresholds even when individuals do not.
11	Brain crash	Watchdog heartbeat; absence of heartbeat triggers a signed `SafeStopAction::HaltSynthesis` token that synthesizers honor by default.
12	Covert channels	Bundles are canonicalized before signing; extra/unknown fields are rejected rather than ignored, closing the obvious exfiltration side-path.

Plus the meta-solutions the robotics sibling has proven out:

Differential validation — dual-instance verdict comparison to catch single-node compromise.
Shadow → Guardian → Autonomous staged deployment pipeline with statistical acceptance gates (Clopper-Pearson bounds).
Proof package generation so an external reviewer can replay a full campaign and verify every verdict.
Adversarial suites (protocol, authority, system, cognitive) that run zero-escape regression tests in CI.

Repository status

The original ten-step build plan (Steps 0–10 in docs/) is shipped at the design + skeleton layer. A follow-up gap-closure plan in spec.md fills in the runtime — biological invariants, screening pipeline, attestation verifier, eleven CLI subcommands, profile library, sim/eval/fuzz harnesses, and CI integration.

Original ten-step build plan

Step	Deliverable	Status
0	Reuse manifest + workspace bootstrap	✅ Code shipped
1	Reuse map	✅ docs/step1-reuse-map.md
2	Threat model	✅ docs/threat-model.md
3	Biological invariant set (D/P/C)	✅ Code shipped (D1–D10, P1–P10, C1–C10 + 4 protocol invariants — gap-closure Steps 6–9)
4	PCA chain for research authorization	✅ docs/step4-pca-research-auth.md
5	Synthesis platform integration	✅ docs/step5-platform-integration.md — runtime CLI + validator pipeline shipped (gap-closure Steps 5, 11)
6	Screening databases	✅ docs/step6-screening-databases.md — `FileBackedHazardDatabase` with signed JSON, hits surfaced into the validator (gap-closure Step 4)
7	HSM and key management	✅ docs/step7-hsm-key-mgmt.md
8	Testing and validation pipeline	✅ docs/step8-testing-validation.md — sim / eval / fuzz / E2E CLI tests shipped (gap-closure Steps 15–17, 19)
9	Regulatory compliance	✅ docs/step9-regulatory-compliance.md
10	Community and ecosystem	✅ docs/step10-community-ecosystem.md

Gap-closure plan (spec.md)

Step	Deliverable	Status
1	Clean baseline (build/test/clippy green)	✅
2	`Unimplemented` invariant fail-closed policy	✅
3	NCBI table-1 codon translation (3-frame)	✅
4	`FileBackedHazardDatabase` (signed JSON + Ed25519)	✅
5	Hazard DB wired into validator pipeline	✅
6	DNA invariants D1–D10 implemented	✅
7	Peptide invariants P1–P10 implemented	✅
8	Chemical invariants C1–C10 implemented	✅
9	Protocol invariant pipeline (PR1–PR4)	✅
10	Attestation logic + replay cache + validator wiring	✅
11	`validate` CLI subcommand	✅
12	`inspect` CLI subcommand	✅
13	`differential` CLI subcommand	✅
14	`intent` CLI subcommand + 9-template registry	✅
15	Simulation harness + `campaign` CLI	✅
16	Trace evaluation engine + `eval` CLI	✅
17	Adversarial fuzz suite + `adversarial` CLI	✅
18	Bio-profile library (6 profiles)	✅
19	E2E CLI integration tests in CI	✅
20	README / CHANGELOG reconciliation	✅ (this entry)
21	Final fmt / clippy / docs / deny / TODO sweep	⏳
22	Open gap-closure pull request	⏳

Test counts

invariant-biosynthesis-core: 403 tests
invariant-biosynthesis-cli: 95 lib + 15 integration
invariant-biosynthesis-eval: 12 lib + 3 integration
invariant-biosynthesis-sim: 5
invariant-biosynthesis-fuzz: 7
22 doc-tests

Known gaps (deferred)

Real cheminformatics: C-family invariants treat SMILES as opaque strings with regex heuristics; SMARTS-based substructure matching, RDKit/OpenBabel integration, and canonicalisation are deferred.
Real homology engines: D-family invariants delegate sequence/protein matching to the HazardScreener trait. The reference FileBackedHazardDatabase ships regex pattern matching; HMMER/BLAST/k-mer engines are not bundled.
Validator-side codon-usage host hint: D7 entropy band is currently fixed; profile-driven host hints are a future extension.
Cross-bundle fragmentation detection: D10 exposes only the stateless variant; the StatefulInvariant orchestration layer is the existing path for cross-bundle state and is not exercised in the validator yet.
Profile-driven step vocabularies: PR2 uses a built-in 25-verb whitelist; per-profile overrides are a future extension.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
crates		crates
docs		docs
examples		examples
profiles		profiles
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Invariant Biosynthesis

Bottom Line Up Front

Problems and Risks

Solutions — all in this one repo

Repository status

Original ten-step build plan

Gap-closure plan (spec.md)

Test counts

Known gaps (deferred)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Invariant Biosynthesis

Bottom Line Up Front

Problems and Risks

Solutions — all in this one repo

Repository status

Original ten-step build plan

Gap-closure plan (spec.md)

Test counts

Known gaps (deferred)

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages