Feat: Add support for request-time specification of model API keys#1677
Feat: Add support for request-time specification of model API keys#1677RobGeada wants to merge 1 commit intoNVIDIA-NeMo:developfrom
Conversation
Greptile SummaryThis PR adds support for per-request API key injection via Issues found:
|
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: nemoguardrails/context.py
Line: 23-34
Comment:
`streaming_handler_var` is defined twice (lines 23-24 and 32-34). The second definition overrides the first.
```suggestion
if TYPE_CHECKING:
from nemoguardrails.logging.explain import ExplainInfo
from nemoguardrails.logging.stats import LLMStats
from nemoguardrails.rails.llm.options import GenerationOptions
from nemoguardrails.streaming import StreamingHandler
streaming_handler_var: contextvars.ContextVar[Optional["StreamingHandler"]] = contextvars.ContextVar(
"streaming_handler", default=None
)
```
How can I resolve this? If you propose a fix, please make it concise. |
00992f2 to
4f0d4bf
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
64ac9d4 to
0e6672c
Compare
Documentation preview |
0e6672c to
3a9c20f
Compare
Additional Comments (1)
After
In both cases, the next request that hits the same cached Consider wrapping the entire second Prompt To Fix With AIThis is a comment left during a code review.
Path: nemoguardrails/server/api.py
Line: 552-579
Comment:
**`_reset_api_keys` skipped on early-exit paths**
After `_set_api_keys` modifies the shared `llm_rails` instance (lines 305–328), there are two early-return/raise paths in the second `try` block that never call `_reset_api_keys`, leaving the cached instance with a modified config and a dangling `original_config` attribute:
1. **Line 557**: `return create_error_chat_completion(...)` for a short `thread_id` — the function returns before the `finally: _reset_api_keys(llm_rails)` block.
2. **Lines 576–579**: `raise HTTPException(...)` for an invalid state format — the exception propagates out of the `try` block without calling reset.
In both cases, the next request that hits the same cached `llm_rails` instance will use the previous request's mutated config as its "original" to back up and restore.
Consider wrapping the entire second `try` block with a `finally` that calls `_reset_api_keys`, or checking `hasattr(llm_rails, "original_config")` at the top and ensuring cleanup happens on all exit paths.
How can I resolve this? If you propose a fix, please make it concise. |
b28631d to
97d6aab
Compare
97d6aab to
d37cfe8
Compare
Additional Comments (2)
A _set_api_keys(llm_rails, dict(request.headers))
try:
...
# early returns inside try will now trigger finally
finally:
_reset_api_keys(llm_rails)Without this, any new early-return path added in the future will silently reintroduce the same bug.
In an async generator, yielding inside a Moving the reset before the yield ensures it runs regardless: finally:
# Reset API keys before yielding [DONE] so they are restored
# even if the client disconnects (preventing the yield from executing).
if llm_rails is not None:
_reset_api_keys(llm_rails)
# Always send [DONE] event when stream ends
yield "data: [DONE]\n\n" |
Additional Comments (2)
Move try:
_set_api_keys(llm_rails, dict(request.headers))
messages = body.messages or []
...
except HTTPException:
_reset_api_keys(llm_rails)
raise
except Exception as ex:
_reset_api_keys(llm_rails)
...
When a client sends a header like
The accompanying test Consider adding a guard at the start of the |
d37cfe8 to
a3a103b
Compare
Additional Comments (3)
On subsequent requests without a header, if To fix this, add an def _reset_api_keys(llm_rails: LLMRails):
if hasattr(llm_rails, "original_config"):
llm_rails.config.models = getattr(llm_rails, "original_config")
llm_rails.llm = None
for model_config in getattr(llm_rails, "original_config", []):
if model_config.type != "main":
model_name = f"{model_config.type}_llm"
if hasattr(llm_rails, model_name):
delattr(llm_rails, model_name)
delattr(llm_rails, "original_config")
llm_rails._init_llms() # <-- restore runtime.registered_action_params
This breaks streaming requests with header API keys because cleanup happens before the stream is consumed. The To fix this, guard the outer if not body.stream:
_reset_api_keys(llm_rails)Or better, move the streaming path to a separate try/finally block that doesn't reset until after the generator completes.
Since except HTTPException:
raise
except Exception as ex:
log.exception(ex)
return create_error_chat_completion(...)
finally:
_reset_api_keys(llm_rails)This improves code clarity and removes the appearance of uncertainty about whether |
…erver Signed-off-by: Rob Geada <rob@geada.net>
a3a103b to
7ba4735
Compare
Description
This PR addresses #1676, introducing the ability to specify per-model API keys via request headers.
Authorization API keys can be provided in the form:
X-{Model Name}-Authorization, e.g.,X-GPT-4-Authorization. The provided API token in the header will then be sent to all models that match{Model Name}.Discussion of the rationale and motivation fort this change is in the linked issue. Thanks!
Related Issue(s)
Implements #1676
Checklist