Skip to content

lucas304dn/Antinjekt

Repository files navigation

🛡️ Antinjekt: The Impervious Vault

Autonomous Multi-Agent Defense Against Indirect Prompt Injections(W.I.P.)

License: MIT Python 3.10+ Security: Impervious

Antinjekt is a high-resilience security framework designed to neutralize "World Class" prompt injections. While standard filters rely on simple regex or basic LLM classification, Antinjekt implements a Zero-Trust Multi-Stage Pipeline that strips instructional intent from data before it ever reaches the primary reasoning agent.


The Architecture: "The Vault"

Antinjekt doesn't just "filter" text; it treats every input as a potential biological threat, processing it through an air-gapped decontamination chamber.

1. The Sanitization Layer (Pass 1)

Powered by GPT-OSS 120B, this layer uses Dynamic XML Tagging (UUID Boundaries) to trap payloads. It uses a Pydantic-enforced schema to extract only raw facts, ignoring any instructional verbs.

2. The Triage & Feedback Loop

A local Python controller scans the output for a signature "Hit List." If suspicious tokens are detected, the system triggers an Aggressive Retry, shifting the Sanitizer into a "Maximum Security" posture where any ambiguity results in total payload rejection.

3. The Main Actor

The Ling-2.6-1T (Trillion Parameter) model acts as the brain. Crucially, it never sees the raw user input. It only interacts with the sanitized, "dead" facts provided by the Vault.

4. The Jury (Consensus Benchmarking)

Every response is audited by a 3-model committee to ensure zero leakage:

  • Safety Judge: Hy3-Preview (High Reasoning)
  • Utility Judge: Nemotron 3 Super
  • Logic Judge: GLM 4.5 Air

📊 The Gauntlet: Benchmark Performance

We evaluate resilience using the Resilience Index ($RI$), a harmonic balance between Safety (blocking attacks) and Utility (retaining useful data).

$$RI = \frac{Safety \times Utility}{Total^2}$$

Dataset Type Result
TensorTrust Indirect/Payload Splitting % Block Rate
JailbreakBench Direct Behavioral Attacks % Block Rate
HarmBench Complex Malicious Tasks % Block Rate

🚀 Quick Start

1. Clone & Setup

git clone https://github.com/your-username/antinjekt.git
cd antinjekt
conda create -n injekt python=3.10
conda activate injekt
pip install -r requirements.txt

2. Configure Environment

Create a .env file in the root directory:

OPENROUTER_API_KEY=sk-or-v1-your-key

3. Run the Gauntlet

Execute the 50-case autonomous optimization loop:

python train.py --gauntlet --iterations 100

Key Features

  • Autonomous Optimization: Built-in support for "AutoResearch" loops that self-correct prompts based on failure logs.
  • Pydantic Enforcement: Guarantees that model outputs never "break character" or output non-JSON garbage.
  • Cognitive Neutralization: A unique prompting strategy that converts active commands ("Delete this") into passive observations ("A request for deletion was noted") to preserve data without risk.
  • Multi-Model Diversity: Prevents "Single Point of Failure" by utilizing diverse architectures (Mamba, MoE, and Dense Transformers).

Research & Logs

The framework maintains a detailed research_notes.md and results/benchmark_log.jsonl, tracking every iteration of the prompt evolution. View your current progress with:

python view_logs.py

Contributing

This is an active security research project. If you find a payload that can "break the vault," please open an issue or submit a PR with the new test case added to the Gauntlet.


Developed for the next generation of secure, agentic workflows.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors