Skip to content

Feat: Add support for request-time specification of model API keys#1677

Open
RobGeada wants to merge 1 commit intoNVIDIA-NeMo:developfrom
RobGeada:header-api-keys
Open

Feat: Add support for request-time specification of model API keys#1677
RobGeada wants to merge 1 commit intoNVIDIA-NeMo:developfrom
RobGeada:header-api-keys

Conversation

@RobGeada
Copy link
Contributor

@RobGeada RobGeada commented Feb 27, 2026

Description

This PR addresses #1676, introducing the ability to specify per-model API keys via request headers.

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
        openai_api_key: 'deployer-key-1'
  - type: self-check
    engine: openai
    model: gpt-5
    parameters:
        openai_api_key: 'deployer-key-2'

Authorization API keys can be provided in the form:

X-{Model Name}-Authorization, e.g., X-GPT-4-Authorization. The provided API token in the header will then be sent to all models that match {Model Name}.

Discussion of the rationale and motivation fort this change is in the linked issue. Thanks!

Related Issue(s)

Implements #1676

Checklist

  • [ x] I've read the CONTRIBUTING guidelines.
  • [x ] I've updated the documentation if applicable.
  • [ x] I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR adds support for per-request API key injection via X-{model-name}-Authorization HTTP headers, enabling multi-tenant scenarios where callers can supply their own LLM credentials at request time. The implementation patches the shared llm_rails config and reinitializes LLMs before each request, then resets afterward.

Issues found:

  • Stale runtime params after reset: _reset_api_keys restores config.models and sets llm.llm = None, but never updates runtime.registered_action_params. This leaves header-key LLM instances registered even after the request completes. While the primary fallback path uses self.llm, the mismatch between restored config and retained runtime params is a structural inconsistency that should be fixed by calling _init_llms() in _reset_api_keys.

  • Streaming finally ordering bug: For streaming requests, return StreamingResponse(...) causes the outer finally block to execute before the generator is consumed by the ASGI server. This causes _reset_api_keys to run prematurely, setting llm_rails.llm = None before the stream even starts. The inner _format_streaming_response finally block then becomes a no-op, leaving streaming requests without proper key reset.

  • Redundant _reset_api_keys calls: The explicit reset calls in the except handlers (lines 657–661) are unnecessary since the finally block covers all paths. Removing them would clarify that cleanup happens unconditionally.

Confidence Score: 2/5

  • Not safe to merge — two logic bugs affect correctness of API key isolation: stale runtime params persist across requests, and streaming requests reset keys before they're consumed.
  • The core feature has three issues: (1) _reset_api_keys doesn't call _init_llms(), leaving registered LLM instances with header keys in the runtime params after reset; (2) for streaming, the outer finally runs before the generator is consumed, prematurely resetting keys and breaking stream processing; (3) redundant reset calls reduce code clarity. Issues 1 and 2 are correctness problems that could cause request failures or key misuse. Issue 3 is a style improvement. The implementation needs fixes to the reset logic before it's safe for production.
  • nemoguardrails/server/api.py — specifically _reset_api_keys (missing _init_llms() call) and the streaming request path's finally ordering.

Last reviewed commit: a3a103b

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Additional Comments (1)

nemoguardrails/context.py
streaming_handler_var is defined twice (lines 23-24 and 32-34). The second definition overrides the first.

if TYPE_CHECKING:
    from nemoguardrails.logging.explain import ExplainInfo
    from nemoguardrails.logging.stats import LLMStats
    from nemoguardrails.rails.llm.options import GenerationOptions
    from nemoguardrails.streaming import StreamingHandler

streaming_handler_var: contextvars.ContextVar[Optional["StreamingHandler"]] = contextvars.ContextVar(
    "streaming_handler", default=None
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/context.py
Line: 23-34

Comment:
`streaming_handler_var` is defined twice (lines 23-24 and 32-34). The second definition overrides the first.

```suggestion
if TYPE_CHECKING:
    from nemoguardrails.logging.explain import ExplainInfo
    from nemoguardrails.logging.stats import LLMStats
    from nemoguardrails.rails.llm.options import GenerationOptions
    from nemoguardrails.streaming import StreamingHandler

streaming_handler_var: contextvars.ContextVar[Optional["StreamingHandler"]] = contextvars.ContextVar(
    "streaming_handler", default=None
)
```

How can I resolve this? If you propose a fix, please make it concise.

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 42.85714% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/server/api.py 42.85% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 1, 2026

@greptileai

@RobGeada RobGeada force-pushed the header-api-keys branch 2 times, most recently from 64ac9d4 to 0e6672c Compare March 4, 2026 16:00
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1677

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 4, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

nemoguardrails/server/api.py
_reset_api_keys skipped on early-exit paths

After _set_api_keys modifies the shared llm_rails instance (lines 305–328), there are two early-return/raise paths in the second try block that never call _reset_api_keys, leaving the cached instance with a modified config and a dangling original_config attribute:

  1. Line 557: return create_error_chat_completion(...) for a short thread_id — the function returns before the finally: _reset_api_keys(llm_rails) block.
  2. Lines 576–579: raise HTTPException(...) for an invalid state format — the exception propagates out of the try block without calling reset.

In both cases, the next request that hits the same cached llm_rails instance will use the previous request's mutated config as its "original" to back up and restore.

Consider wrapping the entire second try block with a finally that calls _reset_api_keys, or checking hasattr(llm_rails, "original_config") at the top and ensuring cleanup happens on all exit paths.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/server/api.py
Line: 552-579

Comment:
**`_reset_api_keys` skipped on early-exit paths**

After `_set_api_keys` modifies the shared `llm_rails` instance (lines 305–328), there are two early-return/raise paths in the second `try` block that never call `_reset_api_keys`, leaving the cached instance with a modified config and a dangling `original_config` attribute:

1. **Line 557**: `return create_error_chat_completion(...)` for a short `thread_id` — the function returns before the `finally: _reset_api_keys(llm_rails)` block.
2. **Lines 576–579**: `raise HTTPException(...)` for an invalid state format — the exception propagates out of the `try` block without calling reset.

In both cases, the next request that hits the same cached `llm_rails` instance will use the previous request's mutated config as its "original" to back up and restore.

Consider wrapping the entire second `try` block with a `finally` that calls `_reset_api_keys`, or checking `hasattr(llm_rails, "original_config")` at the top and ensuring cleanup happens on all exit paths.

How can I resolve this? If you propose a fix, please make it concise.

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 4, 2026

@greptileai

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 4, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (2)

nemoguardrails/server/api.py, line 566
Early return bypasses _reset_api_keys

_set_api_keys is called at line 547, before the try block (which starts at line 549). When body.guardrails.thread_id exists but is shorter than 16 characters, the function returns at line 562 without ever calling _reset_api_keys. The shared (cached) llm_rails instance is left with its original_config attribute set and its model parameters overwritten with the request's header key — a state that will bleed into the next request that hits the same cached instance.

A finally block is the correct fix:

_set_api_keys(llm_rails, dict(request.headers))
try:
    ...
    # early returns inside try will now trigger finally
finally:
    _reset_api_keys(llm_rails)

Without this, any new early-return path added in the future will silently reintroduce the same bug.


nemoguardrails/server/api.py, line 470
_reset_api_keys unreachable on client disconnect

In an async generator, yielding inside a finally block while the generator is being closed (e.g., because the HTTP client disconnects mid-stream) causes Python to raise RuntimeError: async generator ignored GeneratorExit. When that happens, the yield on line 466 throws, and execution never reaches _reset_api_keys on lines 469–470 — leaving the shared llm_rails instance in the modified state permanently.

Moving the reset before the yield ensures it runs regardless:

    finally:
        # Reset API keys before yielding [DONE] so they are restored
        # even if the client disconnects (preventing the yield from executing).
        if llm_rails is not None:
            _reset_api_keys(llm_rails)
        # Always send [DONE] event when stream ends
        yield "data: [DONE]\n\n"

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 4, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (2)

nemoguardrails/server/api.py, line 547
_set_api_keys called outside the exception-handling block

_set_api_keys (line 547) is invoked between two try blocks. If _init_llms() raises inside it (e.g. ModelInitializationError for an invalid key), the exception propagates past the outer except handlers at lines 661 and 664. As a result:

  1. _reset_api_keys is never called.
  2. The cached llm_rails instance is left with original_config stored, a mutated config.models (header key injected), and llm_rails.llm = None.
  3. Every subsequent request that hits the same cached instance will inherit the broken/mutated config.

Move _set_api_keys inside the inner try block so that any failure is caught and properly cleaned up:

    try:
        _set_api_keys(llm_rails, dict(request.headers))
        messages = body.messages or []
        ...
    except HTTPException:
        _reset_api_keys(llm_rails)
        raise
    except Exception as ex:
        _reset_api_keys(llm_rails)
        ...

nemoguardrails/server/api.py, line 322
Empty header value silently replaces the default API key

When a client sends a header like X-Gpt-3.5-Turbo-Authorization: with no value, headers_lower[target_header] is an empty string. The code sets parameters["api_key"] = "", which:

  1. Overwrites the configured default key with an empty string.
  2. Triggers a full LLM reinitialization via _init_llms(), creating a new client that will fail authentication with every real LLM provider.

The accompanying test test_empty_header_value only asserts response.status_code == 200 against the mock server (which never validates the key), masking this production failure.

Consider adding a guard at the start of the if target_header in headers_lower: block to continue when the extracted header value is falsy, so the default configured key is preserved.

@RobGeada
Copy link
Contributor Author

RobGeada commented Mar 4, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (3)

nemoguardrails/server/api.py, line 354
After _reset_api_keys is called, runtime.registered_action_params retains stale LLM instances that were registered with header API keys. While _reset_api_keys restores the config and sets llm_rails.llm = None, it does not call _init_llms() to update the runtime params.

On subsequent requests without a header, if _set_api_keys skips reinitialization (because no matching header is found), the stale LLM instances remain in runtime.registered_action_params. Although the primary code path falls back to self.llm when task-specific LLMs aren't found, the mismatch between restored config and registered runtime params is a structural inconsistency.

To fix this, add an _init_llms() call in _reset_api_keys after restoring the config:

def _reset_api_keys(llm_rails: LLMRails):
    if hasattr(llm_rails, "original_config"):
        llm_rails.config.models = getattr(llm_rails, "original_config")
        llm_rails.llm = None
        for model_config in getattr(llm_rails, "original_config", []):
            if model_config.type != "main":
                model_name = f"{model_config.type}_llm"
                if hasattr(llm_rails, model_name):
                    delattr(llm_rails, model_name)
        delattr(llm_rails, "original_config")
        llm_rails._init_llms()  # <-- restore runtime.registered_action_params

nemoguardrails/server/api.py, line 670
For streaming requests, the outer finally block runs before the response generator is consumed by the ASGI server. This causes _reset_api_keys to execute prematurely:

  1. return StreamingResponse(...) is evaluated (generator not yet started)
  2. Outer finally at line 670 immediately calls _reset_api_keys, setting llm_rails.llm = None
  3. FastAPI then iterates the _format_streaming_response generator
  4. The generator tries to consume stream_iterator, but the LLM is now None

This breaks streaming requests with header API keys because cleanup happens before the stream is consumed. The _format_streaming_response function's own finally block becomes a no-op because original_config was already deleted.

To fix this, guard the outer finally to skip cleanup for streaming requests, allowing _format_streaming_response to be the sole cleanup point:

if not body.stream:
    _reset_api_keys(llm_rails)

Or better, move the streaming path to a separate try/finally block that doesn't reset until after the generator completes.


nemoguardrails/server/api.py, line 670
The finally block at line 668–670 already handles cleanup for all code paths (normal completion, HTTPException, and other exceptions). The explicit _reset_api_keys calls in the except handlers at lines 657–661 are redundant.

Since _reset_api_keys is idempotent (guarded by hasattr(llm_rails, "original_config")), the exception handlers can be simplified by removing these calls and relying solely on the finally block:

except HTTPException:
    raise
except Exception as ex:
    log.exception(ex)
    return create_error_chat_completion(...)
finally:
    _reset_api_keys(llm_rails)

This improves code clarity and removes the appearance of uncertainty about whether finally covers these paths.

…erver

Signed-off-by: Rob Geada <rob@geada.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant