Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,9 +295,9 @@ Evaluating the safety of a LLM-based conversational application is a complex tas

## How is this different?

There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).
There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence, PolicyAI), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).

NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, AlignScore and LangChain chains.
NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, PolicyAI, AlignScore and LangChain chains.

To the best of our knowledge, NeMo Guardrails is the only guardrails toolkit that also offers a solution for modeling the dialog between the user and the LLM. This enables on one hand the ability to guide the dialog in a precise way. On the other hand it enables fine-grained control for when certain guardrails should be used, e.g., use fact-checking only for certain types of questions.

Expand Down
116 changes: 116 additions & 0 deletions docs/configure-rails/guardrail-catalog/community/policyai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# PolicyAI Integration

NeMo Guardrails supports using the [PolicyAI](https://musubilabs.ai) content moderation API as an input and output rail out-of-the-box (you need to have the `POLICYAI_API_KEY` environment variable set).

PolicyAI provides flexible policy-based content moderation, allowing you to define custom policies for your specific use cases and manage them through tags.

## Setup

1. Sign up for a PolicyAI account at [musubilabs.ai](https://musubilabs.ai)
2. Create your policies and organize them with tags
3. Set the required environment variables:

```bash
export POLICYAI_API_KEY="your-api-key"
export POLICYAI_BASE_URL="https://api.musubilabs.ai" # Optional, this is the default
export POLICYAI_TAG_NAME="prod" # Optional, defaults to "prod"
```

## Usage

### Basic Input Moderation

```yaml
rails:
input:
flows:
- policyai moderation on input
```

### Basic Output Moderation

```yaml
rails:
output:
flows:
- policyai moderation on output
```

### Using Different Tags

To use different policy tags for different environments, set the `POLICYAI_TAG_NAME` environment variable:

```bash
# For staging environment
export POLICYAI_TAG_NAME="staging"

# For production environment
export POLICYAI_TAG_NAME="prod"
```

## Complete Example

```yaml
models:
- type: main
engine: openai
model: gpt-4

rails:
input:
flows:
- policyai moderation on input

output:
flows:
- policyai moderation on output
```

## How It Works

1. **Input Rails**: When a user sends a message, PolicyAI evaluates it against all policies attached to the configured tag. If any policy returns `UNSAFE`, the message is blocked.

2. **Output Rails**: Before the bot's response is sent to the user, PolicyAI evaluates it. If the content violates any policy, the response is replaced with a refusal message.

## Response Format

PolicyAI returns the following information for each evaluation:

- `assessment`: `"SAFE"` or `"UNSAFE"`
- `category`: The category of violation (if UNSAFE)
- `severity`: Severity level from 0 (safe) to 3 (high severity)
- `reason`: Human-readable explanation

## Customizing Behavior

To customize the behavior when content is flagged, you can override the default flows in your config:

```text
define subflow policyai moderation on input
"""Custom PolicyAI input moderation."""
$result = execute call_policyai_api(text=$user_message)

if $result.assessment == "UNSAFE"
bot inform content policy violation
stop

define bot inform content policy violation
"I'm sorry, but I cannot process that request. Please rephrase your message."
```

## Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `POLICYAI_API_KEY` | Yes | - | Your PolicyAI API key |
| `POLICYAI_BASE_URL` | No | `https://api.musubilabs.ai` | PolicyAI API base URL |
| `POLICYAI_TAG_NAME` | No | `prod` | Default policy tag to use |

## Error Handling

If the PolicyAI API is unavailable or returns an error, the action will raise an exception. To implement fail-open or fail-closed behavior, you can wrap the action in a try-catch block in your custom flows.

## Learn More

- [PolicyAI Documentation](https://docs.musubilabs.ai)
- [Musubi Labs](https://musubilabs.ai)
21 changes: 21 additions & 0 deletions docs/configure-rails/guardrail-catalog/third-party.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,26 @@ rails:

For more details, check out the [ActiveFence Integration](community/active-fence.md) page.

## PolicyAI

The NeMo Guardrails library supports using [PolicyAI](https://musubilabs.ai) by Musubi Labs as an input and output rail out-of-the-box (you need to have the `POLICYAI_API_KEY` environment variable set).

PolicyAI provides policy-based content moderation, allowing you to define custom policies and organize them with tags for environment-based management.

### Example usage

```yaml
rails:
input:
flows:
- policyai moderation on input
output:
flows:
- policyai moderation on output
```

For more details, check out the [PolicyAI Integration](community/policyai.md) page.

## AutoAlign

The NeMo Guardrails library supports using the AutoAlign's guardrails API (you need to have the `AUTOALIGN_API_KEY` environment variable set).
Expand Down Expand Up @@ -283,6 +303,7 @@ Llama Guard <community/llama-guard>
Pangea AI Guard <community/pangea>
Patronus Evaluate API <community/patronus-evaluate-api>
Patronus Lynx <community/patronus-lynx>
PolicyAI <community/policyai>
Presidio <community/presidio>
Private AI <community/privateai>
Prompt Security <community/prompt-security>
Expand Down
14 changes: 14 additions & 0 deletions nemoguardrails/library/policyai/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
159 changes: 159 additions & 0 deletions nemoguardrails/library/policyai/actions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
PolicyAI Integration for NeMo Guardrails.

PolicyAI provides content moderation and policy enforcement capabilities
for LLM applications. This integration allows using PolicyAI as an input
and output rail for content moderation.

For more information, see: https://musubilabs.ai
"""

import json
import logging
import os
from typing import Optional

import aiohttp

from nemoguardrails.actions import action

log = logging.getLogger(__name__)


def call_policyai_api_mapping(result: dict) -> bool:
"""
Mapping for call_policyai_api.

Expects result to be a dict with:
- "assessment": "SAFE" or "UNSAFE"
- "category": the violation category (if UNSAFE)
- "severity": severity level 0-3

Block (return True) if:
1. Assessment is "UNSAFE"
"""
assessment = result.get("assessment", "SAFE")
return assessment == "UNSAFE"


@action(is_system_action=True, output_mapping=call_policyai_api_mapping)
async def call_policyai_api(
text: Optional[str] = None,
tag_name: Optional[str] = None,
**kwargs,
):
"""
Call the PolicyAI API to evaluate content.

Args:
text: The text content to evaluate.
tag_name: Optional tag name for the PolicyAI evaluation.
If not provided, uses POLICYAI_TAG_NAME env var or "prod".

Returns:
dict with:
- assessment: "SAFE" or "UNSAFE"
- category: the violation category (if UNSAFE)
- severity: severity level 0-3
- reason: explanation for the decision
"""
api_key = os.environ.get("POLICYAI_API_KEY")

if api_key is None:
raise ValueError("POLICYAI_API_KEY environment variable not set.")

base_url = os.environ.get("POLICYAI_BASE_URL", "https://api.musubilabs.ai")
base_url = base_url.rstrip("/")

# Get tag name from parameter, env var, or default
if tag_name is None:
tag_name = os.environ.get("POLICYAI_TAG_NAME", "prod")

url = f"{base_url}/policyai/v1/decisions/evaluate/{tag_name}"

headers = {
"Musubi-Api-Key": api_key,
"Content-Type": "application/json",
}

data = {
"content": [
{
"type": "TEXT",
"content": text,
}
],
}

timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.post(
url=url,
headers=headers,
json=data,
) as response:
if response.status != 200:
raise ValueError(
f"PolicyAI call failed with status code {response.status}.\nDetails: {await response.text()}"
)
response_json = await response.json()
log.info(json.dumps(response_json, indent=2))

# PolicyAI returns results in "data" array for tag-based evaluation
results = response_json.get("data", [])

# Fail-closed: If no policies are attached to the tag, raise an error
# rather than silently allowing content through
if not results:
raise ValueError(
f"PolicyAI returned no policy results for tag '{tag_name}'. "
"Ensure policies are attached to this tag."
)

# Check if all policies failed evaluation
successful_results = [r for r in results if r.get("status") != "failed"]
if not successful_results:
raise ValueError(
f"All PolicyAI policy evaluations failed for tag '{tag_name}'. Check policy configurations."
)

# Aggregate results - if ANY policy returns UNSAFE, overall is UNSAFE
overall_assessment = "SAFE"
triggered_category = "Safe"
max_severity = 0
reason = "Content passed all policy checks"

for result in successful_results:
assessment = result.get("assessment", "SAFE")
if assessment == "UNSAFE":
overall_assessment = "UNSAFE"
triggered_category = result.get("category", "Unknown")
max_severity = max(max_severity, result.get("severity", 0))
reason = result.get("reason", "Policy violation detected")
break # Stop at first UNSAFE result

# Pre-format exception message for Colang 1.x compatibility
# (Colang 1.x doesn't support string concatenation in create event)
exception_message = f"PolicyAI moderation triggered. Content violated policy: {triggered_category}"

return {
"assessment": overall_assessment,
"category": triggered_category,
"severity": max_severity,
"reason": reason,
"exception_message": exception_message,
}
33 changes: 33 additions & 0 deletions nemoguardrails/library/policyai/flows.co
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""
PolicyAI Integration Flows (Colang 2.x)

PolicyAI provides content moderation and policy enforcement capabilities.
For more information, see: https://musubilabs.ai

Supported features:
- Input moderation: Check user messages against configured policies
- Output moderation: Check bot responses against configured policies
- Tag-based evaluation: Use POLICYAI_TAG_NAME env var to specify policy tag
"""

flow policyai moderation on input
"""Guardrail based on PolicyAI assessment."""
$result = await CallPolicyaiApiAction(text=$user_message)

if $result.assessment == "UNSAFE"
if $system.config.enable_rails_exceptions
send PolicyAIModerationRailException(message="PolicyAI moderation triggered. Content violated policy: " + $result.category)
else
bot refuse to respond
abort

flow policyai moderation on output
"""Guardrail based on PolicyAI assessment."""
$result = await CallPolicyaiApiAction(text=$bot_message)

if $result.assessment == "UNSAFE"
if $system.config.enable_rails_exceptions
send PolicyAIModerationRailException(message="PolicyAI moderation triggered. Content violated policy: " + $result.category)
else
bot refuse to respond
abort
Loading