Antinjekt is a high-resilience security framework designed to neutralize "World Class" prompt injections. While standard filters rely on simple regex or basic LLM classification, Antinjekt implements a Zero-Trust Multi-Stage Pipeline that strips instructional intent from data before it ever reaches the primary reasoning agent.
Antinjekt doesn't just "filter" text; it treats every input as a potential biological threat, processing it through an air-gapped decontamination chamber.
Powered by GPT-OSS 120B, this layer uses Dynamic XML Tagging (UUID Boundaries) to trap payloads. It uses a Pydantic-enforced schema to extract only raw facts, ignoring any instructional verbs.
A local Python controller scans the output for a signature "Hit List." If suspicious tokens are detected, the system triggers an Aggressive Retry, shifting the Sanitizer into a "Maximum Security" posture where any ambiguity results in total payload rejection.
The Ling-2.6-1T (Trillion Parameter) model acts as the brain. Crucially, it never sees the raw user input. It only interacts with the sanitized, "dead" facts provided by the Vault.
Every response is audited by a 3-model committee to ensure zero leakage:
- Safety Judge: Hy3-Preview (High Reasoning)
- Utility Judge: Nemotron 3 Super
- Logic Judge: GLM 4.5 Air
We evaluate resilience using the Resilience Index (
| Dataset | Type | Result |
|---|---|---|
| TensorTrust | Indirect/Payload Splitting | % Block Rate |
| JailbreakBench | Direct Behavioral Attacks | % Block Rate |
| HarmBench | Complex Malicious Tasks | % Block Rate |
git clone https://github.com/your-username/antinjekt.git
cd antinjekt
conda create -n injekt python=3.10
conda activate injekt
pip install -r requirements.txtCreate a .env file in the root directory:
OPENROUTER_API_KEY=sk-or-v1-your-keyExecute the 50-case autonomous optimization loop:
python train.py --gauntlet --iterations 100- Autonomous Optimization: Built-in support for "AutoResearch" loops that self-correct prompts based on failure logs.
- Pydantic Enforcement: Guarantees that model outputs never "break character" or output non-JSON garbage.
- Cognitive Neutralization: A unique prompting strategy that converts active commands ("Delete this") into passive observations ("A request for deletion was noted") to preserve data without risk.
- Multi-Model Diversity: Prevents "Single Point of Failure" by utilizing diverse architectures (Mamba, MoE, and Dense Transformers).
The framework maintains a detailed research_notes.md and results/benchmark_log.jsonl, tracking every iteration of the prompt evolution. View your current progress with:
python view_logs.pyThis is an active security research project. If you find a payload that can "break the vault," please open an issue or submit a PR with the new test case added to the Gauntlet.
Developed for the next generation of secure, agentic workflows.