-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Summary
JudgeAgent throws AttributeError: 'str' object has no attribute 'values' when the LLM returns the criteria field as a JSON string instead of a dictionary object. This is an intermittent issue that occurs when the LLM misinterprets the function calling schema.
Environment
- Package:
langwatch-scenario - Version: 0.7.14
- Python Version: 3.13.4
- OS: macOS
Description
When using JudgeAgent with a list of criteria, the agent sometimes fails with:
AttributeError: 'str' object has no attribute 'values'
This occurs in scenario/judge_agent.py at line 200 (and 205) when the code attempts to call criteria.values() on what it expects to be a dictionary, but is actually a string.
Root Cause
The issue occurs in the JudgeAgent.call() method when parsing the LLM's function call response:
# Line 193: Parse tool call arguments
args = json.loads(tool_call.function.arguments)
criteria = args.get("criteria", {})
# Line 200: Fails if criteria is a string
for idx, criterion in enumerate(criteria.values()): # ❌ AttributeError if criteria is str
...Expected behavior: The LLM should return criteria as a dictionary object:
{
"criteria": {
"criterion_0": "true",
"criterion_1": "false"
}
}Actual behavior (when bug occurs): The LLM sometimes returns criteria as a JSON string:
{
"criteria": "{\"criterion_0\": \"true\", \"criterion_1\": \"false\"}"
}When json.loads() parses the outer JSON, criteria remains a string instead of being parsed as a dictionary.
Why This Happens
- Complex Dynamic Schema: The function schema uses dynamically generated property names (sanitized criterion text, truncated to 70 chars), which can confuse the LLM
- Schema Ambiguity: With many criteria, the nested object structure may be misinterpreted
- LLM Behavior: Some LLM models serialize nested objects as JSON strings when uncertain about the schema format
- No Validation: The code doesn't validate the type of
criteriabefore calling.values()
Steps to Reproduce
- Create a
JudgeAgentwith multiple criteria:
import scenario
judge = scenario.JudgeAgent(
criteria=[
"Agent must provide accurate information",
"Agent must not show error messages",
"If data is unavailable, Agent must acknowledge this explicitly",
]
)- Use the judge in a scenario that runs multiple times
- The error occurs intermittently when the LLM returns
criteriaas a string
Expected Behavior
The code should handle both cases:
- When
criteriais a dictionary (normal case) - When
criteriais a JSON string (edge case that needs parsing)
Proposed Fix
Add defensive parsing in scenario/judge_agent.py around line 196:
criteria = args.get("criteria", {})
# Add defensive parsing for string case
if isinstance(criteria, str):
try:
criteria = json.loads(criteria) # Parse if it's a JSON string
except json.JSONDecodeError:
criteria = {} # Fallback to empty dict
elif not isinstance(criteria, dict):
criteria = {} # Ensure it's a dict
# Now safely use criteria.values()
for idx, criterion in enumerate(criteria.values()):
...Impact
- Severity: Medium - Causes test failures but is intermittent
- Frequency: Intermittent - depends on LLM response format
- Workaround: None currently available (would require monkey-patching the library)
Additional Context
The schema definition for criteria uses a dictionary comprehension to create dynamic properties:
"criteria": {
"type": "object",
"properties": {
criteria_names[idx]: {
"type": "string",
"enum": ["true", "false", "inconclusive"],
"description": criterion,
}
for idx, criterion in enumerate(self.criteria)
},
"required": criteria_names,
"additionalProperties": False,
"description": "Strict verdict for each criterion",
}The dynamic property names (sanitized and truncated criterion text) may contribute to the LLM's confusion about the expected format.
Related Code Location
- File:
scenario/judge_agent.py - Method:
JudgeAgent.call() - Lines: ~193-200, ~205