Cryptographic request-validation firewall for AI-controlled biological and chemical synthesis.
COGNITIVE DOMAIN INVARIANT FIREWALL EXECUTION DOMAIN
+-------------------+ +------------------------+ +--------------------+
| RFdiffusion | --> | Verify PCA chain | --> | DNA synthesizer |
| ProteinMPNN | | Screen sequences (D/P) | | Peptide synth |
| LLM lab planners | | Screen structures (C) | | Chemspeed reactor |
| Claude agents | | Sign approved bundle | | Cloud lab API |
| Prompt injection | | Reject + log denied | | Reagent dispenser |
| Hallucinations | | Watchdog heartbeat | | The real world |
+-------------------+ +------------------------+ +--------------------+
UNTRUSTED TRUST BOUNDARY PROTECTED
Nothing from the cognitive domain reaches a synthesizer without Invariant's Ed25519 signature on top of a deterministic screening verdict. The AI cannot bypass it. The AI cannot modify it. The synthesizer verifies the signature before a single base, amide bond, or reagent moves.
Sibling project: invariant-robotics — the motor-actuation substrate this project extends to biology.
AI is now writing DNA, designing proteins, and planning chemical syntheses. The gap between "the model hallucinated" and "the reagent was dispensed" is the same gap robotics faces between "the model hallucinated" and "the actuator moved" — and biology is more irreversible than physics. A released pathogen cannot be patched.
This repository delivers one zero-dependency-where-possible, minimal-dependency-otherwise Rust workspace that closes that gap with a deterministic, cryptographically-enforced, fail-closed firewall between AI planners and synthesis hardware.
- Prompt injection in synthesis planners. An LLM ingesting a paper, a ticket, or a sensor feed can be steered into emitting a bundle that asks for a dangerous sequence.
- Hallucinated confidence. Generative protein/chemistry models produce plausible-looking designs with no ground-truth link to safety consequences.
- Authority scope creep. AI agents exceed the IRB/IBC-approved protocol — wrong organism, wrong volume, wrong BSL.
- Replay and forgery. Signed synthesis commands get replayed, reordered, or forged by a compromised middleware layer.
- Screening-database poisoning. Attacker mutates the hazard list so a select-agent sequence passes as benign.
- Irreversibility. A synthesized toxin, pathogen fragment, or Schedule I precursor cannot be recalled. Software-level "oops" does not apply.
- Audit tampering. Post-incident, logs get rewritten to hide which agent, which prompt, which operator.
- Platform spoofing. A non-approved synthesizer (or a compromised one) accepts a command it should have refused.
- Cross-institutional leakage. Authority delegated for collaboration A gets used for project B.
- Dual-use drift. Individually benign requests compose into a banned capability; no single reviewer sees the whole picture.
- Brain crash during run. The AI planner crashes or stalls mid-protocol and leaves hardware in an unsafe intermediate state.
- Covert-channel leaks. Sequence or structure data exfiltrated via seemingly innocuous synthesis metadata.
Each problem maps to a deterministic, testable mechanism. All of them are built from a minimal dependency set (ed25519-dalek, sha2, serde, clap, thiserror, chrono, base64, regex, rand) — no database, no network stack in the core, no Python runtime, no machine-learning framework.
| # | Problem | Mechanism delivered here |
|---|---|---|
| 1 | Prompt injection | Cognitive outputs are never trusted; they must be wrapped in a SynthesisBundle signed under a PCA-chain leaf whose ops are already narrowed below the dangerous set. |
| 2 | Hallucinated confidence | Deterministic invariant set (D1–D10 DNA, P1–P10 peptide, C1–C10 chemical) runs after the AI — model confidence is ignored, math decides. |
| 3 | Authority scope creep | PCA chain with monotonic scope narrowing: no hop can expand permissions, ever. Verified cryptographically, not by convention. |
| 4 | Replay / forgery | Ed25519 signatures + nonces + timestamp windows + sequence numbers on every bundle, verdict, and execution token. |
| 5 | DB poisoning | Screening-database updates are themselves signed payloads with hash-chained history; unsigned or stale DBs fail closed. |
| 6 | Irreversibility | Fail-closed default. The synthesizer requires a signed execution token per command — absence of signature means no action. |
| 7 | Audit tampering | Append-only JSONL audit log with per-entry Ed25519 signatures and a SHA-256 hash chain; tamper detection in O(n). Optional Merkle-root witness replication for external notarization. |
| 8 | Platform spoofing | Each synthesizer holds its own keypair; firewall binds execution tokens to a specific kid and the hardware refuses tokens addressed to a different device. |
| 9 | Cross-institutional leakage | Op-scope algebra (intersection / subset verification) proves every hop's ops are a strict subset of the parent's. |
| 10 | Dual-use drift | Runtime threat-scoring monitor aggregates across bundles per operator / per session; composite scores cross thresholds even when individuals do not. |
| 11 | Brain crash | Watchdog heartbeat; absence of heartbeat triggers a signed SafeStopAction::HaltSynthesis token that synthesizers honor by default. |
| 12 | Covert channels | Bundles are canonicalized before signing; extra/unknown fields are rejected rather than ignored, closing the obvious exfiltration side-path. |
Plus the meta-solutions the robotics sibling has proven out:
- Differential validation — dual-instance verdict comparison to catch single-node compromise.
- Shadow → Guardian → Autonomous staged deployment pipeline with statistical acceptance gates (Clopper-Pearson bounds).
- Proof package generation so an external reviewer can replay a full campaign and verify every verdict.
- Adversarial suites (protocol, authority, system, cognitive) that run zero-escape regression tests in CI.
The original ten-step build plan (Steps 0–10 in docs/) is shipped at the
design + skeleton layer. A follow-up gap-closure plan in spec.md
fills in the runtime — biological invariants, screening pipeline, attestation
verifier, eleven CLI subcommands, profile library, sim/eval/fuzz harnesses,
and CI integration.
| Step | Deliverable | Status |
|---|---|---|
| 0 | Reuse manifest + workspace bootstrap | ✅ Code shipped |
| 1 | Reuse map | ✅ docs/step1-reuse-map.md |
| 2 | Threat model | ✅ docs/threat-model.md |
| 3 | Biological invariant set (D/P/C) | ✅ Code shipped (D1–D10, P1–P10, C1–C10 + 4 protocol invariants — gap-closure Steps 6–9) |
| 4 | PCA chain for research authorization | ✅ docs/step4-pca-research-auth.md |
| 5 | Synthesis platform integration | ✅ docs/step5-platform-integration.md — runtime CLI + validator pipeline shipped (gap-closure Steps 5, 11) |
| 6 | Screening databases | ✅ docs/step6-screening-databases.md — FileBackedHazardDatabase with signed JSON, hits surfaced into the validator (gap-closure Step 4) |
| 7 | HSM and key management | ✅ docs/step7-hsm-key-mgmt.md |
| 8 | Testing and validation pipeline | ✅ docs/step8-testing-validation.md — sim / eval / fuzz / E2E CLI tests shipped (gap-closure Steps 15–17, 19) |
| 9 | Regulatory compliance | ✅ docs/step9-regulatory-compliance.md |
| 10 | Community and ecosystem | ✅ docs/step10-community-ecosystem.md |
| Step | Deliverable | Status |
|---|---|---|
| 1 | Clean baseline (build/test/clippy green) | ✅ |
| 2 | Unimplemented invariant fail-closed policy |
✅ |
| 3 | NCBI table-1 codon translation (3-frame) | ✅ |
| 4 | FileBackedHazardDatabase (signed JSON + Ed25519) |
✅ |
| 5 | Hazard DB wired into validator pipeline | ✅ |
| 6 | DNA invariants D1–D10 implemented | ✅ |
| 7 | Peptide invariants P1–P10 implemented | ✅ |
| 8 | Chemical invariants C1–C10 implemented | ✅ |
| 9 | Protocol invariant pipeline (PR1–PR4) | ✅ |
| 10 | Attestation logic + replay cache + validator wiring | ✅ |
| 11 | validate CLI subcommand |
✅ |
| 12 | inspect CLI subcommand |
✅ |
| 13 | differential CLI subcommand |
✅ |
| 14 | intent CLI subcommand + 9-template registry |
✅ |
| 15 | Simulation harness + campaign CLI |
✅ |
| 16 | Trace evaluation engine + eval CLI |
✅ |
| 17 | Adversarial fuzz suite + adversarial CLI |
✅ |
| 18 | Bio-profile library (6 profiles) | ✅ |
| 19 | E2E CLI integration tests in CI | ✅ |
| 20 | README / CHANGELOG reconciliation | ✅ (this entry) |
| 21 | Final fmt / clippy / docs / deny / TODO sweep | ⏳ |
| 22 | Open gap-closure pull request | ⏳ |
invariant-biosynthesis-core: 403 testsinvariant-biosynthesis-cli: 95 lib + 15 integrationinvariant-biosynthesis-eval: 12 lib + 3 integrationinvariant-biosynthesis-sim: 5invariant-biosynthesis-fuzz: 7- 22 doc-tests
- Real cheminformatics: C-family invariants treat SMILES as opaque strings with regex heuristics; SMARTS-based substructure matching, RDKit/OpenBabel integration, and canonicalisation are deferred.
- Real homology engines: D-family invariants delegate sequence/protein matching to the
HazardScreenertrait. The referenceFileBackedHazardDatabaseships regex pattern matching; HMMER/BLAST/k-mer engines are not bundled. - Validator-side codon-usage host hint: D7 entropy band is currently fixed; profile-driven host hints are a future extension.
- Cross-bundle fragmentation detection: D10 exposes only the stateless variant; the
StatefulInvariantorchestration layer is the existing path for cross-bundle state and is not exercised in the validator yet. - Profile-driven step vocabularies: PR2 uses a built-in 25-verb whitelist; per-profile overrides are a future extension.