Guardrails: Route embedder chat through guardrails and add UI toggle#234

Open

FilipCivljak wants to merge 1 commit into

mainfrom

Contributor

FilipCivljak commented May 15, 2026

What type of PR is this?

This is a feature because it adds content guardrails to the embedder chat pipeline with a UI toggle to enable/disable them at runtime.

What does this do?

Adds a POST /guardrails/validate fast pre-filter endpoint to the Python guardrails service that checks user input against safety patterns (jailbreak, prompt injection, restricted topics, toxicity, etc.) before forwarding to NeMo
Adds a GuardedClient Go wrapper around the LLM client that calls the guardrails validate endpoint before streaming chat; blocked queries receive a refusal message instead of reaching the LLM
Exposes GET /api/v1/guardrails and PUT /api/v1/guardrails endpoints on the embedder (auth-gated) for reading and toggling guardrails state at runtime without restart
Adds EMBEDDER_GUARDRAILS_URL env var — empty disables guardrails entirely, defaults to http://guardrails:8001 in the compose stack
Adds a Guardrails monitoring page to the sidebar
Adds a Safety section to the Config page with a live on/off toggle bound to the new API

Which issue(s) does this PR fix/relate to?

Have you included tests for your changes?

No. The guardrails validate endpoint is a thin pass-through wrapper over pattern matching; manual testing was performed by sending known blocked phrases (e.g. "ignore previous instructions") and verifying a refusal is returned without the LLM being called, and by toggling the switch in the Config UI.

Did you document any new/modified features?

Notes

The validate endpoint runs only regex/substring matching (no NeMo/LLM call) so latency is <1 ms. NeMo still processes requests that pass the pre-filter, providing a second semantic layer. The toggle uses atomic.Bool so it is safe to flip at runtime under concurrent requests.


          Guardrails: Route embedder chat through guardrails and add UI toggle

3fa4ef4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet