LLM Security Toolkit

Architecture & Detailed Technical Specification

A production-grade Python middleware library for securing every LLM call, input guards, output validation, tool-call enforcement, and policy-driven control.

1. What Is This Project?

The LLM Security Toolkit is a Python middleware library that sits between your application code and any LLM provider, intercepting every model call to enforce security checks before and after the AI responds.

Think of it as a security firewall specifically designed for AI calls:

Scans every prompt for injection and jailbreak patterns before it reaches the model
Validates every response for unsafe content, credential leaks, or dangerous commands
Enforces schema rules on every tool/function call the model tries to make
Applies configurable policies to decide whether to block, warn, or log

The library exposes a clean, importable API, just a few extra lines in any existing Python AI app. No infrastructure changes required.

Property	Value
Type	Python library (importable package, pip-installable)
Purpose	Security middleware between app code and LLM provider
Primary Interface	Decorator / context manager / provider wrapper
Guards	Prompt injection, unsafe output, dangerous tool calls
Policy Engine	YAML or Python dict — per-endpoint policies
Provider Support	OpenAI (v1), Generic callable, extensible to Claude, Gemini
Framework Support	FastAPI (native), Flask (planned)
Return Type	`GuardDecision { allowed, score, reasons, safe_output }`

2. High-Level Architecture

2.1 System Overview

The toolkit is organized into five distinct layers, each with a clearly scoped responsibility:

┌─────────────────────────────────────────────────────────────┐
│                      Your Application                       │
└────────────────────────────┬────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                 Middleware Layer (FastAPI / Flask)           │
│          Dependency injection or global middleware           │
└────────────────────────────┬────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                    Providers Layer                          │
│         OpenAIProvider / GenericProvider (adapters)         │
└──────┬─────────────────────┴──────────────────┬────────────┘
       │                                         │
┌──────▼──────┐                         ┌───────▼──────┐
│   Guards    │                         │   Guards     │
│  (Input)    │                         │  (Output)    │
│ prompts.py  │                         │ outputs.py   │
│  tools.py   │                         │  tools.py    │
└──────┬──────┘                         └───────┬──────┘
       │                                         │
┌──────▼─────────────────────────────────────────▼────────────┐
│                     Policy Engine                           │
│           Policy | config.py | policies.py                  │
└─────────────────────────────────────────────────────────────┘

2.2 Request / Response Flow

Every guarded LLM call follows a six-stage pipeline:

#	Stage	What Happens
1	App calls guarded provider	`guarded_openai_chat(prompt, tools, policy)`
2	Input scanners run	`scan_prompt()` on prompt + system instructions
3	Risk decision	Block / Warn / Allow based on policy thresholds
4	Forward to LLM	Real API call to OpenAI / local model
5	Output scanners run	`scan_output()` + `validate_tool_call()`
6	Return GuardDecision	`{ allowed, score, reasons, safe_output }`

At each stage, if a policy threshold is exceeded, the pipeline short-circuits and returns a GuardDecision immediately — the LLM is never reached for blocked inputs, and blocked outputs are never returned to the user.

3. Repository Structure

The project follows the src-layout convention to avoid import conflicts and mirrors the separation of concerns across its five internal layers:

llm-security-toolkit/
├── README.md                    # Project overview and quick-start
├── CONTRIBUTING.md              # Fork & contribution guide
├── CODE_OF_CONDUCT.md           # Community standards
├── LICENSE                      # MIT (encourages forks)
├── pyproject.toml               # Build config, deps, tool settings
├── .pre-commit-config.yaml      # ruff, black, mypy on every commit
├── .github/
│   └── workflows/ci.yml         # Tests + lint on push / PR
│
├── src/llm_security/            # Main package (src layout)
│   ├── __init__.py              # Public re-exports
│   ├── types.py                 # ScanResult, GuardDecision, ToolCall
│   ├── policies.py              # Policy models + built-in presets
│   ├── config.py                # YAML / dict → Policy loaders
│   ├── exceptions.py            # BlockedByPolicyError, etc.
│   ├── logging.py               # log_decision() + hooks
│   ├── guards/
│   │   ├── prompts.py           # Input / injection guards
│   │   ├── outputs.py           # Output / content guards
│   │   └── tools.py             # Tool-call validation guards
│   ├── providers/
│   │   ├── base.py              # ProviderAdapter ABC
│   │   ├── openai.py            # OpenAI concrete adapter
│   │   └── generic.py           # Generic callable adapter
│   └── middleware/
│       ├── fastapi.py           # FastAPI dependency + middleware
│       └── flask.py             # Flask (planned)
│
├── tests/                       # Pytest test suite
├── examples/                    # Runnable minimal examples
└── docs/                        # MkDocs documentation

4. Core Types & Models

File: src/llm_security/types.py

Every part of the library speaks the same three data structures. These are the lingua franca of the entire package.

ScanResult

The output of a single guard check. Every guard function returns one of these:

@dataclass
class ScanResult:
    allowed:     bool            # True = safe to proceed
    score:       float           # 0.0 (safe) → 1.0 (critical risk)
    reasons:     List[str]       # Human-readable explanations
    safe_output: Optional[str]   # Redacted text (output guards only)

GuardDecision

The top-level result returned to your application — an aggregation of all ScanResults from all active guards:

@dataclass
class GuardDecision:
    allowed:      bool
    score:        float
    reasons:      List[str]
    safe_output:  Optional[str]
    scan_results: List[ScanResult]  # Full audit trail

ToolCall

Represents a structured tool/function invocation that the model requested:

@dataclass
class ToolCall:
    name:   str    # e.g. 'read_file'
    args:   Dict   # e.g. { 'path': '/etc/passwd' }
    schema: Dict   # JSON Schema the args must conform to

5. Policy Engine

Files: policies.py + config.py

A Policy is the single configuration object that controls the entire security pipeline. Every guard, every provider, every middleware reads from it.

5.1 Policy Structure

class Policy(BaseModel):
    # Guard toggles
    prompt_guard_enabled:  bool  = True
    output_guard_enabled:  bool  = True
    tool_guard_enabled:    bool  = True

    # Thresholds (0.0 – 1.0)
    block_threshold:  float = 0.75   # Score above this → block
    warn_threshold:   float = 0.40   # Score above this → log warning

    # Allowed tool names (None = allow all)
    allowed_tools:  Optional[List[str]] = None

    # On block: raise exception OR return GuardDecision
    raise_on_block: bool = True

5.2 Built-in Policy Presets

Policy	Behavior	Best For
`StrictPolicy`	Block on any risk signal	Production, sensitive apps
`BalancedPolicy`	Block high-risk, warn medium	Standard apps (default)
`LoggingOnlyPolicy`	Never block — log only	Development / testing

5.3 Loading Policies

# From YAML file (recommended for production)
policy = load_policy_from_yaml('policies/production.yaml')

# From dict (useful in tests)
policy = load_policy_from_dict({
    'block_threshold': 0.8,
    'allowed_tools': ['read_file', 'search_web'],
})

6. The Guards Layer

Files: src/llm_security/guards/

Guards are the security brain of the toolkit. Each guard module is small, focused, and independently testable — designed to be easy to fork and extend with new detection rules.

Guard Module	Pattern Detected	Category	Default Action
Prompt Guard	`"ignore previous instructions"`	Injection	Block or warn
Prompt Guard	`"pretend you are the system"`	Jailbreak	Block or warn
Prompt Guard	Requests to reveal hidden context	Exfiltration	Block
Output Guard	API keys, tokens, passwords	Secret leak	Redact + warn
Output Guard	Shell commands (`rm -rf`, `curl`, etc.)	OS command	Block
Output Guard	Self-harm or malware instructions	Content	Block
Tool Guard	Invalid tool name	Schema	Block
Tool Guard	`rm -rf /` or admin API calls	Dangerous op	Block
Tool Guard	Args not matching schema	Validation	Block

6.1 Prompt Guard — `guards/prompts.py`

Runs before any API call. Scans the user prompt and system instructions for patterns that indicate an attempt to subvert the model's behaviour.

def scan_prompt(prompt: str, policy: Policy) -> ScanResult:
    """
    Heuristic patterns checked:
      - 'ignore previous instructions' / 'disregard above'
      - 'you are now the system prompt'
      - 'repeat everything above' (context exfiltration)
      - 'DAN' jailbreak variants
      - Base64 encoded instructions
    Returns ScanResult with score + reasons.
    """

6.2 Output Guard — `guards/outputs.py`

Runs on every token of the model's response before it reaches your application. Can optionally redact sensitive material rather than blocking outright.

def scan_output(text: str, policy: Policy) -> ScanResult:
    """
    Patterns checked:
      - Credential regexes (API keys, JWTs, SSH private keys)
      - Shell command patterns (rm, curl, wget, sudo)
      - Malware / ransomware indicators
      - Self-harm or violence instructions
    safe_output field will contain redacted version if score < block_threshold.
    """

6.3 Tool Call Guard — `guards/tools.py`

Intercepts every function/tool invocation the model wants to make and validates it against the policy's allowlist and the tool's JSON schema.

def validate_tool_call(call: ToolCall, policy: Policy) -> ScanResult:
    """
    Checks applied:
      - Tool name in policy.allowed_tools (if allowlist defined)
      - Args validate against call.schema (jsonschema)
      - Blocked operation patterns (file deletion, network scanning)
      - Internal admin API URL detection
    """

7. Providers Layer

Files: src/llm_security/providers/

Providers wrap real LLM clients. They orchestrate the full guard pipeline — input scan → forward → output scan — and return a GuardDecision to the caller.

File	Responsibility
`providers/base.py`	Abstract base class `ProviderAdapter`. Defines the `chat()` interface all providers must implement.
`providers/openai.py`	Concrete adapter wrapping the OpenAI Python SDK. Runs all guards automatically around every `chat()` call.
`providers/generic.py`	Accepts any callable as the LLM. The user passes their own client function; the adapter handles the full guard flow around it.

7.1 ProviderAdapter Interface

class ProviderAdapter(ABC):
    @abstractmethod
    def chat(
        self, *,
        messages: List[Dict],
        tools:    Optional[List[Dict]] = None,
        policy:   Optional[Policy] = None,
    ) -> GuardDecision: ...

7.2 OpenAI Adapter Flow

scan_prompt() on all user + system messages
If allowed → call openai.chat.completions.create(...)
scan_output() on response content
validate_tool_call() on any tool_calls the model requested
Return GuardDecision aggregating all results

8. Middleware Layer

Files: src/llm_security/middleware/

The middleware layer makes it trivial to guard an entire HTTP endpoint with almost no code change.

8.1 FastAPI Integration

# Dependency injection — guard all calls to /chat
def get_guarded_openai(policy: Policy = BalancedPolicy()):
    return OpenAIProvider(policy=policy)

@app.post('/chat')
async def chat(
    req: ChatRequest,
    provider: OpenAIProvider = Depends(get_guarded_openai),
):
    decision = provider.chat(messages=req.messages)
    if not decision.allowed:
        raise HTTPException(400, detail=decision.reasons)
    return { 'reply': decision.safe_output }

8.2 Middleware vs Dependency

Approach	Best For
Dependency (`Depends`)	Per-route policy. Inject a different provider per endpoint. Most flexible.
Middleware class	Global policy applied to every request. Good for org-wide defaults.
Flask middleware	Planned for v1.1. Same pattern adapted for Flask's `before/after_request` hooks.

9. Logging & Exceptions

9.1 Structured Logging — `logging.py`

Every GuardDecision can be passed to log_decision() which emits a structured JSON log entry compatible with any logging backend:

def log_decision(decision: GuardDecision, logger: logging.Logger) -> None:
    logger.info({
        'allowed':   decision.allowed,
        'score':     decision.score,
        'reasons':   decision.reasons,
        'timestamp': datetime.utcnow().isoformat(),
    })
# Future: OpenTelemetry spans, Datadog trace hooks

9.2 Exception Hierarchy — `exceptions.py`

Exception	When Raised
`BlockedByPolicyError`	Prompt or output exceeds `block_threshold` and `raise_on_block=True`
`InvalidToolCallError`	Tool name not in allowlist, or args fail schema validation

10. Public API Surface

File: src/llm_security/__init__.py

The top-level package re-exports everything a user needs. Nothing implementation-specific is public:

from .providers.openai  import OpenAIProvider
from .providers.generic import GenericProvider
from .policies          import StrictPolicy, BalancedPolicy, LoggingOnlyPolicy
from .config            import load_policy_from_dict, load_policy_from_yaml
from .types             import ScanResult, GuardDecision, ToolCall
from .exceptions        import BlockedByPolicyError, InvalidToolCallError

__all__ = [
    'OpenAIProvider', 'GenericProvider',
    'StrictPolicy', 'BalancedPolicy', 'LoggingOnlyPolicy',
    'load_policy_from_dict', 'load_policy_from_yaml',
    'ScanResult', 'GuardDecision', 'ToolCall',
    'BlockedByPolicyError', 'InvalidToolCallError',
]

11. Tests, Examples & Docs

11.1 Test Suite — `tests/`

File	Covers
`test_policies.py`	Policy loading from dict and YAML, threshold logic, preset validation
`test_guards_prompts.py`	Each injection and jailbreak pattern: pass and fail cases
`test_guards_outputs.py`	Credential regex, OS command patterns, content categories
`test_guards_tools.py`	Schema validation, allowlist enforcement, blocked operations
`test_providers_openai.py`	OpenAI adapter with mocked API — full pipeline test
`test_middleware_fastapi.py`	FastAPI TestClient integration — dependency injection

11.2 Examples — `examples/`

basic_openai_guard.py — Minimal OpenAI guard in 15 lines
fastapi_endpoint_guard.py — Full FastAPI endpoint with policy injection
custom_policy_example.py — Writing and loading a custom YAML policy

11.3 Documentation — `docs/`

getting-started.md — Install, first call, first policy
configuration.md — Full Policy reference and YAML schema
providers.md — How to add a new ProviderAdapter
middleware.md — FastAPI and Flask integration guides
contributing.md — Adding new guard rules, running tests

12. Extensibility & Design Principles

The toolkit is deliberately designed to be fork-friendly and contribution-friendly. These principles guide every architectural decision:

Small, Focused Guards

Each guard function is a single Python function with one job. Adding a new detection rule means adding one function and one test — no class hierarchies to navigate.

Policy-First Design

All security decisions flow through the Policy object. Operators can change security posture (strict vs. logging-only) with a config file change — no code change required.

Provider Abstraction

The ProviderAdapter ABC means any LLM client can be wrapped. Adding Claude, Gemini, or a local Ollama model requires implementing one method: chat().

Zero Infra Requirement

The toolkit is a pure Python package. No sidecar, no agent, no proxy. It runs in-process alongside your existing app.

13. Future Roadmap

Version	Feature	Status
v1.0	OpenAI adapter + prompt/output/tool guards + FastAPI	Planned
v1.1	Anthropic (Claude) provider adapter	Planned
v1.1	Flask middleware	Planned
v1.2	OpenTelemetry tracing integration	Idea
v1.3	Gemini + local model (Ollama) adapters	Idea
v2.0	Optional hosted SaaS gateway pairing	Future

LLM Security Toolkit — Architecture Document v1.0

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

LLM Security Toolkit

Architecture & Detailed Technical Specification

Table of Contents

1. What Is This Project?

2. High-Level Architecture

2.1 System Overview

2.2 Request / Response Flow

3. Repository Structure

4. Core Types & Models

ScanResult

GuardDecision

ToolCall

5. Policy Engine

5.1 Policy Structure

5.2 Built-in Policy Presets

5.3 Loading Policies

6. The Guards Layer

6.1 Prompt Guard — guards/prompts.py

6.2 Output Guard — guards/outputs.py

6.3 Tool Call Guard — guards/tools.py

7. Providers Layer

7.1 ProviderAdapter Interface

7.2 OpenAI Adapter Flow

8. Middleware Layer

8.1 FastAPI Integration

8.2 Middleware vs Dependency

9. Logging & Exceptions

9.1 Structured Logging — logging.py

9.2 Exception Hierarchy — exceptions.py

10. Public API Surface

11. Tests, Examples & Docs

11.1 Test Suite — tests/

11.2 Examples — examples/

11.3 Documentation — docs/

12. Extensibility & Design Principles

Small, Focused Guards

Policy-First Design

Provider Abstraction

Zero Infra Requirement

13. Future Roadmap

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

6.1 Prompt Guard — `guards/prompts.py`

6.2 Output Guard — `guards/outputs.py`

6.3 Tool Call Guard — `guards/tools.py`

9.1 Structured Logging — `logging.py`

9.2 Exception Hierarchy — `exceptions.py`

11.1 Test Suite — `tests/`

11.2 Examples — `examples/`

11.3 Documentation — `docs/`

Packages