🛡️ Antinjekt: The Impervious Vault

Autonomous Multi-Agent Defense Against Indirect Prompt Injections(W.I.P.)

Antinjekt is a high-resilience security framework designed to neutralize "World Class" prompt injections. While standard filters rely on simple regex or basic LLM classification, Antinjekt implements a Zero-Trust Multi-Stage Pipeline that strips instructional intent from data before it ever reaches the primary reasoning agent.

The Architecture: "The Vault"

Antinjekt doesn't just "filter" text; it treats every input as a potential biological threat, processing it through an air-gapped decontamination chamber.

1. The Sanitization Layer (Pass 1)

Powered by GPT-OSS 120B, this layer uses Dynamic XML Tagging (UUID Boundaries) to trap payloads. It uses a Pydantic-enforced schema to extract only raw facts, ignoring any instructional verbs.

2. The Triage & Feedback Loop

A local Python controller scans the output for a signature "Hit List." If suspicious tokens are detected, the system triggers an Aggressive Retry, shifting the Sanitizer into a "Maximum Security" posture where any ambiguity results in total payload rejection.

3. The Main Actor

The Ling-2.6-1T (Trillion Parameter) model acts as the brain. Crucially, it never sees the raw user input. It only interacts with the sanitized, "dead" facts provided by the Vault.

4. The Jury (Consensus Benchmarking)

Every response is audited by a 3-model committee to ensure zero leakage:

Safety Judge: Hy3-Preview (High Reasoning)
Utility Judge: Nemotron 3 Super
Logic Judge: GLM 4.5 Air

📊 The Gauntlet: Benchmark Performance

We evaluate resilience using the Resilience Index ($RI$), a harmonic balance between Safety (blocking attacks) and Utility (retaining useful data).

$$RI = \frac{Safety \times Utility}{Total^2}$$

Dataset	Type	Result
TensorTrust	Indirect/Payload Splitting	% Block Rate
JailbreakBench	Direct Behavioral Attacks	% Block Rate
HarmBench	Complex Malicious Tasks	% Block Rate

🚀 Quick Start

1. Clone & Setup

git clone https://github.com/your-username/antinjekt.git
cd antinjekt
conda create -n injekt python=3.10
conda activate injekt
pip install -r requirements.txt

2. Configure Environment

Create a .env file in the root directory:

OPENROUTER_API_KEY=sk-or-v1-your-key

3. Run the Gauntlet

Execute the 50-case autonomous optimization loop:

python train.py --gauntlet --iterations 100

Key Features

Autonomous Optimization: Built-in support for "AutoResearch" loops that self-correct prompts based on failure logs.
Pydantic Enforcement: Guarantees that model outputs never "break character" or output non-JSON garbage.
Cognitive Neutralization: A unique prompting strategy that converts active commands ("Delete this") into passive observations ("A request for deletion was noted") to preserve data without risk.
Multi-Model Diversity: Prevents "Single Point of Failure" by utilizing diverse architectures (Mamba, MoE, and Dense Transformers).

Research & Logs

The framework maintains a detailed research_notes.md and results/benchmark_log.jsonl, tracking every iteration of the prompt evolution. View your current progress with:

python view_logs.py

Contributing

This is an active security research project. If you find a payload that can "break the vault," please open an issue or submit a PR with the new test case added to the Gauntlet.

Developed for the next generation of secure, agentic workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
HarmBench		HarmBench
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
analysis.ipynb		analysis.ipynb
prepare.py		prepare.py
program.md		program.md
progress.png		progress.png
pyproject.toml		pyproject.toml
test.txt		test.txt
train.py		train.py
uv.lock		uv.lock
view_logs.py		view_logs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Antinjekt: The Impervious Vault

Autonomous Multi-Agent Defense Against Indirect Prompt Injections(W.I.P.)

The Architecture: "The Vault"

1. The Sanitization Layer (Pass 1)

2. The Triage & Feedback Loop

3. The Main Actor

4. The Jury (Consensus Benchmarking)

📊 The Gauntlet: Benchmark Performance

🚀 Quick Start

1. Clone & Setup

2. Configure Environment

3. Run the Gauntlet

Key Features

Research & Logs

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ Antinjekt: The Impervious Vault

Autonomous Multi-Agent Defense Against Indirect Prompt Injections(W.I.P.)

The Architecture: "The Vault"

1. The Sanitization Layer (Pass 1)

2. The Triage & Feedback Loop

3. The Main Actor

4. The Jury (Consensus Benchmarking)

📊 The Gauntlet: Benchmark Performance

🚀 Quick Start

1. Clone & Setup

2. Configure Environment

3. Run the Gauntlet

Key Features

Research & Logs

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages