A production-grade Python middleware library for securing every LLM call, input guards, output validation, tool-call enforcement, and policy-driven control.
- What Is This Project?
- High-Level Architecture
- Repository Structure
- Core Types & Models
- Policy Engine
- The Guards Layer
- Providers Layer
- Middleware Layer
- Logging & Exceptions
- Public API Surface
- Tests, Examples & Docs
- Extensibility & Design Principles
- Future Roadmap
The LLM Security Toolkit is a Python middleware library that sits between your application code and any LLM provider, intercepting every model call to enforce security checks before and after the AI responds.
Think of it as a security firewall specifically designed for AI calls:
- Scans every prompt for injection and jailbreak patterns before it reaches the model
- Validates every response for unsafe content, credential leaks, or dangerous commands
- Enforces schema rules on every tool/function call the model tries to make
- Applies configurable policies to decide whether to block, warn, or log
The library exposes a clean, importable API, just a few extra lines in any existing Python AI app. No infrastructure changes required.
| Property | Value |
|---|---|
| Type | Python library (importable package, pip-installable) |
| Purpose | Security middleware between app code and LLM provider |
| Primary Interface | Decorator / context manager / provider wrapper |
| Guards | Prompt injection, unsafe output, dangerous tool calls |
| Policy Engine | YAML or Python dict — per-endpoint policies |
| Provider Support | OpenAI (v1), Generic callable, extensible to Claude, Gemini |
| Framework Support | FastAPI (native), Flask (planned) |
| Return Type | GuardDecision { allowed, score, reasons, safe_output } |
The toolkit is organized into five distinct layers, each with a clearly scoped responsibility:
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ Middleware Layer (FastAPI / Flask) │
│ Dependency injection or global middleware │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ Providers Layer │
│ OpenAIProvider / GenericProvider (adapters) │
└──────┬─────────────────────┴──────────────────┬────────────┘
│ │
┌──────▼──────┐ ┌───────▼──────┐
│ Guards │ │ Guards │
│ (Input) │ │ (Output) │
│ prompts.py │ │ outputs.py │
│ tools.py │ │ tools.py │
└──────┬──────┘ └───────┬──────┘
│ │
┌──────▼─────────────────────────────────────────▼────────────┐
│ Policy Engine │
│ Policy | config.py | policies.py │
└─────────────────────────────────────────────────────────────┘
Every guarded LLM call follows a six-stage pipeline:
| # | Stage | What Happens |
|---|---|---|
| 1 | App calls guarded provider | guarded_openai_chat(prompt, tools, policy) |
| 2 | Input scanners run | scan_prompt() on prompt + system instructions |
| 3 | Risk decision | Block / Warn / Allow based on policy thresholds |
| 4 | Forward to LLM | Real API call to OpenAI / local model |
| 5 | Output scanners run | scan_output() + validate_tool_call() |
| 6 | Return GuardDecision | { allowed, score, reasons, safe_output } |
At each stage, if a policy threshold is exceeded, the pipeline short-circuits and returns a
GuardDecisionimmediately — the LLM is never reached for blocked inputs, and blocked outputs are never returned to the user.
The project follows the src-layout convention to avoid import conflicts and mirrors the separation of concerns across its five internal layers:
llm-security-toolkit/
├── README.md # Project overview and quick-start
├── CONTRIBUTING.md # Fork & contribution guide
├── CODE_OF_CONDUCT.md # Community standards
├── LICENSE # MIT (encourages forks)
├── pyproject.toml # Build config, deps, tool settings
├── .pre-commit-config.yaml # ruff, black, mypy on every commit
├── .github/
│ └── workflows/ci.yml # Tests + lint on push / PR
│
├── src/llm_security/ # Main package (src layout)
│ ├── __init__.py # Public re-exports
│ ├── types.py # ScanResult, GuardDecision, ToolCall
│ ├── policies.py # Policy models + built-in presets
│ ├── config.py # YAML / dict → Policy loaders
│ ├── exceptions.py # BlockedByPolicyError, etc.
│ ├── logging.py # log_decision() + hooks
│ ├── guards/
│ │ ├── prompts.py # Input / injection guards
│ │ ├── outputs.py # Output / content guards
│ │ └── tools.py # Tool-call validation guards
│ ├── providers/
│ │ ├── base.py # ProviderAdapter ABC
│ │ ├── openai.py # OpenAI concrete adapter
│ │ └── generic.py # Generic callable adapter
│ └── middleware/
│ ├── fastapi.py # FastAPI dependency + middleware
│ └── flask.py # Flask (planned)
│
├── tests/ # Pytest test suite
├── examples/ # Runnable minimal examples
└── docs/ # MkDocs documentation
File:
src/llm_security/types.py
Every part of the library speaks the same three data structures. These are the lingua franca of the entire package.
The output of a single guard check. Every guard function returns one of these:
@dataclass
class ScanResult:
allowed: bool # True = safe to proceed
score: float # 0.0 (safe) → 1.0 (critical risk)
reasons: List[str] # Human-readable explanations
safe_output: Optional[str] # Redacted text (output guards only)The top-level result returned to your application — an aggregation of all ScanResults from all active guards:
@dataclass
class GuardDecision:
allowed: bool
score: float
reasons: List[str]
safe_output: Optional[str]
scan_results: List[ScanResult] # Full audit trailRepresents a structured tool/function invocation that the model requested:
@dataclass
class ToolCall:
name: str # e.g. 'read_file'
args: Dict # e.g. { 'path': '/etc/passwd' }
schema: Dict # JSON Schema the args must conform toFiles:
policies.py+config.py
A Policy is the single configuration object that controls the entire security pipeline. Every guard, every provider, every middleware reads from it.
class Policy(BaseModel):
# Guard toggles
prompt_guard_enabled: bool = True
output_guard_enabled: bool = True
tool_guard_enabled: bool = True
# Thresholds (0.0 – 1.0)
block_threshold: float = 0.75 # Score above this → block
warn_threshold: float = 0.40 # Score above this → log warning
# Allowed tool names (None = allow all)
allowed_tools: Optional[List[str]] = None
# On block: raise exception OR return GuardDecision
raise_on_block: bool = True| Policy | Behavior | Best For |
|---|---|---|
StrictPolicy |
Block on any risk signal | Production, sensitive apps |
BalancedPolicy |
Block high-risk, warn medium | Standard apps (default) |
LoggingOnlyPolicy |
Never block — log only | Development / testing |
# From YAML file (recommended for production)
policy = load_policy_from_yaml('policies/production.yaml')
# From dict (useful in tests)
policy = load_policy_from_dict({
'block_threshold': 0.8,
'allowed_tools': ['read_file', 'search_web'],
})Files:
src/llm_security/guards/
Guards are the security brain of the toolkit. Each guard module is small, focused, and independently testable — designed to be easy to fork and extend with new detection rules.
| Guard Module | Pattern Detected | Category | Default Action |
|---|---|---|---|
| Prompt Guard | "ignore previous instructions" |
Injection | Block or warn |
| Prompt Guard | "pretend you are the system" |
Jailbreak | Block or warn |
| Prompt Guard | Requests to reveal hidden context | Exfiltration | Block |
| Output Guard | API keys, tokens, passwords | Secret leak | Redact + warn |
| Output Guard | Shell commands (rm -rf, curl, etc.) |
OS command | Block |
| Output Guard | Self-harm or malware instructions | Content | Block |
| Tool Guard | Invalid tool name | Schema | Block |
| Tool Guard | rm -rf / or admin API calls |
Dangerous op | Block |
| Tool Guard | Args not matching schema | Validation | Block |
Runs before any API call. Scans the user prompt and system instructions for patterns that indicate an attempt to subvert the model's behaviour.
def scan_prompt(prompt: str, policy: Policy) -> ScanResult:
"""
Heuristic patterns checked:
- 'ignore previous instructions' / 'disregard above'
- 'you are now the system prompt'
- 'repeat everything above' (context exfiltration)
- 'DAN' jailbreak variants
- Base64 encoded instructions
Returns ScanResult with score + reasons.
"""Runs on every token of the model's response before it reaches your application. Can optionally redact sensitive material rather than blocking outright.
def scan_output(text: str, policy: Policy) -> ScanResult:
"""
Patterns checked:
- Credential regexes (API keys, JWTs, SSH private keys)
- Shell command patterns (rm, curl, wget, sudo)
- Malware / ransomware indicators
- Self-harm or violence instructions
safe_output field will contain redacted version if score < block_threshold.
"""Intercepts every function/tool invocation the model wants to make and validates it against the policy's allowlist and the tool's JSON schema.
def validate_tool_call(call: ToolCall, policy: Policy) -> ScanResult:
"""
Checks applied:
- Tool name in policy.allowed_tools (if allowlist defined)
- Args validate against call.schema (jsonschema)
- Blocked operation patterns (file deletion, network scanning)
- Internal admin API URL detection
"""Files:
src/llm_security/providers/
Providers wrap real LLM clients. They orchestrate the full guard pipeline — input scan → forward → output scan — and return a GuardDecision to the caller.
| File | Responsibility |
|---|---|
providers/base.py |
Abstract base class ProviderAdapter. Defines the chat() interface all providers must implement. |
providers/openai.py |
Concrete adapter wrapping the OpenAI Python SDK. Runs all guards automatically around every chat() call. |
providers/generic.py |
Accepts any callable as the LLM. The user passes their own client function; the adapter handles the full guard flow around it. |
class ProviderAdapter(ABC):
@abstractmethod
def chat(
self, *,
messages: List[Dict],
tools: Optional[List[Dict]] = None,
policy: Optional[Policy] = None,
) -> GuardDecision: ...scan_prompt()on all user + system messages- If allowed → call
openai.chat.completions.create(...) scan_output()on response contentvalidate_tool_call()on anytool_callsthe model requested- Return
GuardDecisionaggregating all results
Files:
src/llm_security/middleware/
The middleware layer makes it trivial to guard an entire HTTP endpoint with almost no code change.
# Dependency injection — guard all calls to /chat
def get_guarded_openai(policy: Policy = BalancedPolicy()):
return OpenAIProvider(policy=policy)
@app.post('/chat')
async def chat(
req: ChatRequest,
provider: OpenAIProvider = Depends(get_guarded_openai),
):
decision = provider.chat(messages=req.messages)
if not decision.allowed:
raise HTTPException(400, detail=decision.reasons)
return { 'reply': decision.safe_output }| Approach | Best For |
|---|---|
Dependency (Depends) |
Per-route policy. Inject a different provider per endpoint. Most flexible. |
| Middleware class | Global policy applied to every request. Good for org-wide defaults. |
| Flask middleware | Planned for v1.1. Same pattern adapted for Flask's before/after_request hooks. |
Every GuardDecision can be passed to log_decision() which emits a structured JSON log entry compatible with any logging backend:
def log_decision(decision: GuardDecision, logger: logging.Logger) -> None:
logger.info({
'allowed': decision.allowed,
'score': decision.score,
'reasons': decision.reasons,
'timestamp': datetime.utcnow().isoformat(),
})
# Future: OpenTelemetry spans, Datadog trace hooks| Exception | When Raised |
|---|---|
BlockedByPolicyError |
Prompt or output exceeds block_threshold and raise_on_block=True |
InvalidToolCallError |
Tool name not in allowlist, or args fail schema validation |
File:
src/llm_security/__init__.py
The top-level package re-exports everything a user needs. Nothing implementation-specific is public:
from .providers.openai import OpenAIProvider
from .providers.generic import GenericProvider
from .policies import StrictPolicy, BalancedPolicy, LoggingOnlyPolicy
from .config import load_policy_from_dict, load_policy_from_yaml
from .types import ScanResult, GuardDecision, ToolCall
from .exceptions import BlockedByPolicyError, InvalidToolCallError
__all__ = [
'OpenAIProvider', 'GenericProvider',
'StrictPolicy', 'BalancedPolicy', 'LoggingOnlyPolicy',
'load_policy_from_dict', 'load_policy_from_yaml',
'ScanResult', 'GuardDecision', 'ToolCall',
'BlockedByPolicyError', 'InvalidToolCallError',
]| File | Covers |
|---|---|
test_policies.py |
Policy loading from dict and YAML, threshold logic, preset validation |
test_guards_prompts.py |
Each injection and jailbreak pattern: pass and fail cases |
test_guards_outputs.py |
Credential regex, OS command patterns, content categories |
test_guards_tools.py |
Schema validation, allowlist enforcement, blocked operations |
test_providers_openai.py |
OpenAI adapter with mocked API — full pipeline test |
test_middleware_fastapi.py |
FastAPI TestClient integration — dependency injection |
basic_openai_guard.py— Minimal OpenAI guard in 15 linesfastapi_endpoint_guard.py— Full FastAPI endpoint with policy injectioncustom_policy_example.py— Writing and loading a custom YAML policy
getting-started.md— Install, first call, first policyconfiguration.md— Full Policy reference and YAML schemaproviders.md— How to add a new ProviderAdaptermiddleware.md— FastAPI and Flask integration guidescontributing.md— Adding new guard rules, running tests
The toolkit is deliberately designed to be fork-friendly and contribution-friendly. These principles guide every architectural decision:
Each guard function is a single Python function with one job. Adding a new detection rule means adding one function and one test — no class hierarchies to navigate.
All security decisions flow through the Policy object. Operators can change security posture (strict vs. logging-only) with a config file change — no code change required.
The ProviderAdapter ABC means any LLM client can be wrapped. Adding Claude, Gemini, or a local Ollama model requires implementing one method: chat().
The toolkit is a pure Python package. No sidecar, no agent, no proxy. It runs in-process alongside your existing app.
| Version | Feature | Status |
|---|---|---|
| v1.0 | OpenAI adapter + prompt/output/tool guards + FastAPI | Planned |
| v1.1 | Anthropic (Claude) provider adapter | Planned |
| v1.1 | Flask middleware | Planned |
| v1.2 | OpenTelemetry tracing integration | Idea |
| v1.3 | Gemini + local model (Ollama) adapters | Idea |
| v2.0 | Optional hosted SaaS gateway pairing | Future |
LLM Security Toolkit — Architecture Document v1.0