Guardrails: Route embedder chat through guardrails and add UI toggle#234
Open
FilipCivljak wants to merge 1 commit into
Open
Guardrails: Route embedder chat through guardrails and add UI toggle#234FilipCivljak wants to merge 1 commit into
FilipCivljak wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
This is a feature because it adds content guardrails to the embedder chat pipeline with a UI toggle to enable/disable them at runtime.
What does this do?
Adds a POST /guardrails/validate fast pre-filter endpoint to the Python guardrails service that checks user input against safety patterns (jailbreak, prompt injection, restricted topics, toxicity, etc.) before forwarding to NeMo
Adds a GuardedClient Go wrapper around the LLM client that calls the guardrails validate endpoint before streaming chat; blocked queries receive a refusal message instead of reaching the LLM
Exposes GET /api/v1/guardrails and PUT /api/v1/guardrails endpoints on the embedder (auth-gated) for reading and toggling guardrails state at runtime without restart
Adds EMBEDDER_GUARDRAILS_URL env var — empty disables guardrails entirely, defaults to http://guardrails:8001 in the compose stack
Adds a Guardrails monitoring page to the sidebar
Adds a Safety section to the Config page with a live on/off toggle bound to the new API
Which issue(s) does this PR fix/relate to?
Have you included tests for your changes?
No. The guardrails validate endpoint is a thin pass-through wrapper over pattern matching; manual testing was performed by sending known blocked phrases (e.g. "ignore previous instructions") and verifying a refusal is returned without the LLM being called, and by toggling the switch in the Config UI.
Did you document any new/modified features?
Notes
The validate endpoint runs only regex/substring matching (no NeMo/LLM call) so latency is <1 ms. NeMo still processes requests that pass the pre-filter, providing a second semantic layer. The toggle uses atomic.Bool so it is safe to flip at runtime under concurrent requests.