Feat: Add support for request-time specification of model API keys by RobGeada · Pull Request #1677 · NVIDIA-NeMo/Guardrails

RobGeada · 2026-02-27T22:11:24Z

Description

This PR addresses #1676, introducing the ability to specify per-model API keys via request headers.

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
        openai_api_key: 'deployer-key-1'
  - type: self-check
    engine: openai
    model: gpt-5
    parameters:
        openai_api_key: 'deployer-key-2'

Authorization API keys can be provided in the form:

X-{Model Name}-Authorization, e.g., X-GPT-4-Authorization. The provided API token in the header will then be sent to all models that match {Model Name}.

Discussion of the rationale and motivation fort this change is in the linked issue. Thanks!

Related Issue(s)

Implements #1676

Checklist

[ x] I've read the CONTRIBUTING guidelines.
[x ] I've updated the documentation if applicable.
[ x] I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

greptile-apps · 2026-02-27T22:16:48Z

Greptile Summary

This PR adds support for per-request API key injection via X-{model-name}-Authorization HTTP headers, enabling multi-tenant scenarios where callers can supply their own LLM credentials at request time. The implementation patches the shared llm_rails config and reinitializes LLMs before each request, then resets afterward.

Issues found:

Stale runtime params after reset: _reset_api_keys restores config.models and sets llm.llm = None, but never updates runtime.registered_action_params. This leaves header-key LLM instances registered even after the request completes. While the primary fallback path uses self.llm, the mismatch between restored config and retained runtime params is a structural inconsistency that should be fixed by calling _init_llms() in _reset_api_keys.
Streaming finally ordering bug: For streaming requests, return StreamingResponse(...) causes the outer finally block to execute before the generator is consumed by the ASGI server. This causes _reset_api_keys to run prematurely, setting llm_rails.llm = None before the stream even starts. The inner _format_streaming_response finally block then becomes a no-op, leaving streaming requests without proper key reset.
Redundant _reset_api_keys calls: The explicit reset calls in the except handlers (lines 657–661) are unnecessary since the finally block covers all paths. Removing them would clarify that cleanup happens unconditionally.

Confidence Score: 2/5

Not safe to merge — two logic bugs affect correctness of API key isolation: stale runtime params persist across requests, and streaming requests reset keys before they're consumed.
The core feature has three issues: (1) _reset_api_keys doesn't call _init_llms(), leaving registered LLM instances with header keys in the runtime params after reset; (2) for streaming, the outer finally runs before the generator is consumed, prematurely resetting keys and breaking stream processing; (3) redundant reset calls reduce code clarity. Issues 1 and 2 are correctness problems that could cause request failures or key misuse. Issue 3 is a style improvement. The implementation needs fixes to the reset logic before it's safe for production.
nemoguardrails/server/api.py — specifically _reset_api_keys (missing _init_llms() call) and the streaming request path's finally ordering.

_{Last reviewed commit: a3a103b}

greptile-apps

_{8 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

nemoguardrails/llm/models/header_api_key_wrapper.py

tests/server/test_header_api_key.py

greptile-apps · 2026-02-27T22:16:59Z

Additional Comments (1)

nemoguardrails/context.py
streaming_handler_var is defined twice (lines 23-24 and 32-34). The second definition overrides the first.

if TYPE_CHECKING:
    from nemoguardrails.logging.explain import ExplainInfo
    from nemoguardrails.logging.stats import LLMStats
    from nemoguardrails.rails.llm.options import GenerationOptions
    from nemoguardrails.streaming import StreamingHandler

streaming_handler_var: contextvars.ContextVar[Optional["StreamingHandler"]] = contextvars.ContextVar(
    "streaming_handler", default=None
)

Prompt To Fix With AI

This is a comment left during a code review.
Path: nemoguardrails/context.py
Line: 23-34

Comment:
`streaming_handler_var` is defined twice (lines 23-24 and 32-34). The second definition overrides the first.

```suggestion
if TYPE_CHECKING:
    from nemoguardrails.logging.explain import ExplainInfo
    from nemoguardrails.logging.stats import LLMStats
    from nemoguardrails.rails.llm.options import GenerationOptions
    from nemoguardrails.streaming import StreamingHandler

streaming_handler_var: contextvars.ContextVar[Optional["StreamingHandler"]] = contextvars.ContextVar(
    "streaming_handler", default=None
)
```

How can I resolve this? If you propose a fix, please make it concise.

codecov · 2026-03-01T11:54:49Z

Codecov Report

❌ Patch coverage is 42.85714% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/server/api.py	42.85%	20 Missing ⚠️

📢 Thoughts on this report? Let us know!

RobGeada · 2026-03-01T12:53:59Z

@greptileai

examples/configs/api_key_from_header/config.yml

github-actions · 2026-03-04T16:03:02Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1677

RobGeada · 2026-03-04T16:10:16Z

@greptileai

nemoguardrails/server/api.py

tests/server/test_header_api_key.py

greptile-apps · 2026-03-04T16:15:38Z

Additional Comments (1)

nemoguardrails/server/api.py
_reset_api_keys skipped on early-exit paths

After _set_api_keys modifies the shared llm_rails instance (lines 305–328), there are two early-return/raise paths in the second try block that never call _reset_api_keys, leaving the cached instance with a modified config and a dangling original_config attribute:

Line 557: return create_error_chat_completion(...) for a short thread_id — the function returns before the finally: _reset_api_keys(llm_rails) block.
Lines 576–579: raise HTTPException(...) for an invalid state format — the exception propagates out of the try block without calling reset.

In both cases, the next request that hits the same cached llm_rails instance will use the previous request's mutated config as its "original" to back up and restore.

Consider wrapping the entire second try block with a finally that calls _reset_api_keys, or checking hasattr(llm_rails, "original_config") at the top and ensuring cleanup happens on all exit paths.

Prompt To Fix With AI

This is a comment left during a code review.
Path: nemoguardrails/server/api.py
Line: 552-579

Comment:
**`_reset_api_keys` skipped on early-exit paths**

After `_set_api_keys` modifies the shared `llm_rails` instance (lines 305–328), there are two early-return/raise paths in the second `try` block that never call `_reset_api_keys`, leaving the cached instance with a modified config and a dangling `original_config` attribute:

1. **Line 557**: `return create_error_chat_completion(...)` for a short `thread_id` — the function returns before the `finally: _reset_api_keys(llm_rails)` block.
2. **Lines 576–579**: `raise HTTPException(...)` for an invalid state format — the exception propagates out of the `try` block without calling reset.

In both cases, the next request that hits the same cached `llm_rails` instance will use the previous request's mutated config as its "original" to back up and restore.

Consider wrapping the entire second `try` block with a `finally` that calls `_reset_api_keys`, or checking `hasattr(llm_rails, "original_config")` at the top and ensuring cleanup happens on all exit paths.

How can I resolve this? If you propose a fix, please make it concise.

RobGeada · 2026-03-04T16:55:58Z

@greptileai

nemoguardrails/server/api.py

RobGeada · 2026-03-04T17:10:27Z

@greptileai

greptile-apps · 2026-03-04T17:18:32Z

Additional Comments (2)

nemoguardrails/server/api.py, line 566
Early return bypasses _reset_api_keys

_set_api_keys is called at line 547, before the try block (which starts at line 549). When body.guardrails.thread_id exists but is shorter than 16 characters, the function returns at line 562 without ever calling _reset_api_keys. The shared (cached) llm_rails instance is left with its original_config attribute set and its model parameters overwritten with the request's header key — a state that will bleed into the next request that hits the same cached instance.

A finally block is the correct fix:

_set_api_keys(llm_rails, dict(request.headers))
try:
    ...
    # early returns inside try will now trigger finally
finally:
    _reset_api_keys(llm_rails)

Without this, any new early-return path added in the future will silently reintroduce the same bug.

nemoguardrails/server/api.py, line 470
_reset_api_keys unreachable on client disconnect

In an async generator, yielding inside a finally block while the generator is being closed (e.g., because the HTTP client disconnects mid-stream) causes Python to raise RuntimeError: async generator ignored GeneratorExit. When that happens, the yield on line 466 throws, and execution never reaches _reset_api_keys on lines 469–470 — leaving the shared llm_rails instance in the modified state permanently.

Moving the reset before the yield ensures it runs regardless:

    finally:
        # Reset API keys before yielding [DONE] so they are restored
        # even if the client disconnects (preventing the yield from executing).
        if llm_rails is not None:
            _reset_api_keys(llm_rails)
        # Always send [DONE] event when stream ends
        yield "data: [DONE]\n\n"

RobGeada · 2026-03-04T17:47:39Z

@greptileai

greptile-apps · 2026-03-04T17:55:50Z

Additional Comments (2)

nemoguardrails/server/api.py, line 547
_set_api_keys called outside the exception-handling block

_set_api_keys (line 547) is invoked between two try blocks. If _init_llms() raises inside it (e.g. ModelInitializationError for an invalid key), the exception propagates past the outer except handlers at lines 661 and 664. As a result:

_reset_api_keys is never called.
The cached llm_rails instance is left with original_config stored, a mutated config.models (header key injected), and llm_rails.llm = None.
Every subsequent request that hits the same cached instance will inherit the broken/mutated config.

Move _set_api_keys inside the inner try block so that any failure is caught and properly cleaned up:

    try:
        _set_api_keys(llm_rails, dict(request.headers))
        messages = body.messages or []
        ...
    except HTTPException:
        _reset_api_keys(llm_rails)
        raise
    except Exception as ex:
        _reset_api_keys(llm_rails)
        ...

nemoguardrails/server/api.py, line 322
Empty header value silently replaces the default API key

When a client sends a header like X-Gpt-3.5-Turbo-Authorization: with no value, headers_lower[target_header] is an empty string. The code sets parameters["api_key"] = "", which:

Overwrites the configured default key with an empty string.
Triggers a full LLM reinitialization via _init_llms(), creating a new client that will fail authentication with every real LLM provider.

The accompanying test test_empty_header_value only asserts response.status_code == 200 against the mock server (which never validates the key), masking this production failure.

Consider adding a guard at the start of the if target_header in headers_lower: block to continue when the extracted header value is falsy, so the default configured key is preserved.

RobGeada · 2026-03-04T18:00:59Z

@greptileai

greptile-apps · 2026-03-04T18:11:56Z

Additional Comments (3)

nemoguardrails/server/api.py, line 354
After _reset_api_keys is called, runtime.registered_action_params retains stale LLM instances that were registered with header API keys. While _reset_api_keys restores the config and sets llm_rails.llm = None, it does not call _init_llms() to update the runtime params.

On subsequent requests without a header, if _set_api_keys skips reinitialization (because no matching header is found), the stale LLM instances remain in runtime.registered_action_params. Although the primary code path falls back to self.llm when task-specific LLMs aren't found, the mismatch between restored config and registered runtime params is a structural inconsistency.

To fix this, add an _init_llms() call in _reset_api_keys after restoring the config:

def _reset_api_keys(llm_rails: LLMRails):
    if hasattr(llm_rails, "original_config"):
        llm_rails.config.models = getattr(llm_rails, "original_config")
        llm_rails.llm = None
        for model_config in getattr(llm_rails, "original_config", []):
            if model_config.type != "main":
                model_name = f"{model_config.type}_llm"
                if hasattr(llm_rails, model_name):
                    delattr(llm_rails, model_name)
        delattr(llm_rails, "original_config")
        llm_rails._init_llms()  # <-- restore runtime.registered_action_params

nemoguardrails/server/api.py, line 670
For streaming requests, the outer finally block runs before the response generator is consumed by the ASGI server. This causes _reset_api_keys to execute prematurely:

return StreamingResponse(...) is evaluated (generator not yet started)
Outer finally at line 670 immediately calls _reset_api_keys, setting llm_rails.llm = None
FastAPI then iterates the _format_streaming_response generator
The generator tries to consume stream_iterator, but the LLM is now None

This breaks streaming requests with header API keys because cleanup happens before the stream is consumed. The _format_streaming_response function's own finally block becomes a no-op because original_config was already deleted.

To fix this, guard the outer finally to skip cleanup for streaming requests, allowing _format_streaming_response to be the sole cleanup point:

if not body.stream:
    _reset_api_keys(llm_rails)

Or better, move the streaming path to a separate try/finally block that doesn't reset until after the generator completes.

nemoguardrails/server/api.py, line 670
The finally block at line 668–670 already handles cleanup for all code paths (normal completion, HTTPException, and other exceptions). The explicit _reset_api_keys calls in the except handlers at lines 657–661 are redundant.

Since _reset_api_keys is idempotent (guarded by hasattr(llm_rails, "original_config")), the exception handlers can be simplified by removing these calls and relying solely on the finally block:

except HTTPException:
    raise
except Exception as ex:
    log.exception(ex)
    return create_error_chat_completion(...)
finally:
    _reset_api_keys(llm_rails)

This improves code clarity and removes the appearance of uncertainty about whether finally covers these paths.

…erver Signed-off-by: Rob Geada <rob@geada.net>

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

nemoguardrails/llm/models/header_api_key_wrapper.py Outdated Show resolved Hide resolved

nemoguardrails/llm/models/header_api_key_wrapper.py Outdated Show resolved Hide resolved

tests/server/test_header_api_key.py Outdated Show resolved Hide resolved

RobGeada force-pushed the header-api-keys branch from 00992f2 to 4f0d4bf Compare February 28, 2026 19:21

greptile-apps bot reviewed Mar 1, 2026

View reviewed changes

examples/configs/api_key_from_header/config.yml Outdated Show resolved Hide resolved

RobGeada force-pushed the header-api-keys branch 2 times, most recently from 64ac9d4 to 0e6672c Compare March 4, 2026 16:00

RobGeada force-pushed the header-api-keys branch from 0e6672c to 3a9c20f Compare March 4, 2026 16:08

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

nemoguardrails/server/api.py Outdated Show resolved Hide resolved

tests/server/test_header_api_key.py Show resolved Hide resolved

RobGeada force-pushed the header-api-keys branch from b28631d to 97d6aab Compare March 4, 2026 16:55

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

nemoguardrails/server/api.py Show resolved Hide resolved

RobGeada force-pushed the header-api-keys branch from 97d6aab to d37cfe8 Compare March 4, 2026 17:10

RobGeada force-pushed the header-api-keys branch from d37cfe8 to a3a103b Compare March 4, 2026 18:00

Add support for request-time specification of model API keys in the s…

7ba4735

…erver Signed-off-by: Rob Geada <rob@geada.net>

RobGeada force-pushed the header-api-keys branch from a3a103b to 7ba4735 Compare March 4, 2026 18:16

Conversation

RobGeada commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Checklist

Uh oh!

greptile-apps bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Feb 27, 2026

Uh oh!

codecov bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

RobGeada commented Mar 1, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 4, 2026

Documentation preview

Uh oh!

RobGeada commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

RobGeada commented Mar 4, 2026

Uh oh!

Uh oh!

RobGeada commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

RobGeada commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

RobGeada commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RobGeada commented Feb 27, 2026 •

edited

Loading

greptile-apps bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

codecov bot commented Mar 1, 2026 •

edited

Loading