Skip to content

fix(guardrails): prevent bypass of blocked Tool Result Policy when context is sensitive#4250

Open
wengkit218-pixel wants to merge 2 commits intoarchestra-ai:mainfrom
wengkit218-pixel:fix/blocked-tool-result-policy-bypass-4225
Open

fix(guardrails): prevent bypass of blocked Tool Result Policy when context is sensitive#4250
wengkit218-pixel wants to merge 2 commits intoarchestra-ai:mainfrom
wengkit218-pixel:fix/blocked-tool-result-policy-bypass-4225

Conversation

@wengkit218-pixel
Copy link
Copy Markdown
Contributor

Summary

Fixes #4225

When an agent has \Treat context as sensitive from the start of chat\ enabled, blocked Tool Result Policies were being bypassed. The raw tool result was sent to the model instead of being replaced with the blocked message.

Root Cause

In \platform/backend/src/guardrails/trusted-data.ts, \evaluateIfContextIsTrusted()\ returned early around line 68 when \considerContextUntrusted\ was \ rue, so it skipped the real Tool Result Policy evaluation around line 125 and returned empty \ oolResultUpdates.

Fix

Instead of returning early when \considerContextUntrusted\ is \ rue, we now:

  1. Set \contextIsTrusted = false\
  2. Set \unsafeContextBoundary\ to mark the context as untrusted
  3. Continue to evaluate Tool Result Policies for all tool calls

This ensures that blocked tool results are properly replaced with [Content blocked by policy]\ before being sent to the model.

Test Plan

To reproduce the bug:

  1. Configure a tool like
    ead_issue\
  2. Set Tool Call Policy so the call is allowed
  3. Set Tool Result Policy to \Blocked\
  4. Enable \Treat context as sensitive from the start of chat\
  5. Ask the agent to read an issue

Before fix: The raw tool result is sent in the model-facing LLM request.

After fix: The tool result is replaced with [Content blocked by policy]\ before being sent to the model.

/claim #4225

…ntext is sensitive

When an agent has 'Treat context as sensitive from the start of chat' enabled,
blocked Tool Result Policies were being bypassed. The function returned early
when considerContextUntrusted was true, skipping the Tool Result Policy
evaluation entirely.

Root cause: In trusted-data.ts, evaluateIfContextIsTrusted() returned early
around line 68 when considerContextUntrusted was true, so it skipped the
real Tool Result Policy evaluation around line 125 and returned empty
toolResultUpdates.

Fix: Instead of returning early, we now set contextIsTrusted=false and
unsafeContextBoundary, then continue to evaluate Tool Result Policies.
This ensures blocked tool results are properly replaced with the blocked
message before being sent to the model.

Fixes archestra-ai#4225

/claim archestra-ai#4225
@CLAassistant
Copy link
Copy Markdown
Contributor

CLAassistant commented May 1, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot requested a review from iskhakov May 5, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blocked Tool Result Policy bypass when agent starts in sensitive context

2 participants