Skip to content

JudgeAgent AttributeError when LLM returns criteria as string instead of dict #161

@rajdeepmahal24

Description

@rajdeepmahal24

Summary

JudgeAgent throws AttributeError: 'str' object has no attribute 'values' when the LLM returns the criteria field as a JSON string instead of a dictionary object. This is an intermittent issue that occurs when the LLM misinterprets the function calling schema.

Environment

  • Package: langwatch-scenario
  • Version: 0.7.14
  • Python Version: 3.13.4
  • OS: macOS

Description

When using JudgeAgent with a list of criteria, the agent sometimes fails with:

AttributeError: 'str' object has no attribute 'values'

This occurs in scenario/judge_agent.py at line 200 (and 205) when the code attempts to call criteria.values() on what it expects to be a dictionary, but is actually a string.

Root Cause

The issue occurs in the JudgeAgent.call() method when parsing the LLM's function call response:

# Line 193: Parse tool call arguments
args = json.loads(tool_call.function.arguments)
criteria = args.get("criteria", {})

# Line 200: Fails if criteria is a string
for idx, criterion in enumerate(criteria.values()):  # ❌ AttributeError if criteria is str
    ...

Expected behavior: The LLM should return criteria as a dictionary object:

{
  "criteria": {
    "criterion_0": "true",
    "criterion_1": "false"
  }
}

Actual behavior (when bug occurs): The LLM sometimes returns criteria as a JSON string:

{
  "criteria": "{\"criterion_0\": \"true\", \"criterion_1\": \"false\"}"
}

When json.loads() parses the outer JSON, criteria remains a string instead of being parsed as a dictionary.

Why This Happens

  1. Complex Dynamic Schema: The function schema uses dynamically generated property names (sanitized criterion text, truncated to 70 chars), which can confuse the LLM
  2. Schema Ambiguity: With many criteria, the nested object structure may be misinterpreted
  3. LLM Behavior: Some LLM models serialize nested objects as JSON strings when uncertain about the schema format
  4. No Validation: The code doesn't validate the type of criteria before calling .values()

Steps to Reproduce

  1. Create a JudgeAgent with multiple criteria:
import scenario

judge = scenario.JudgeAgent(
    criteria=[
        "Agent must provide accurate information",
        "Agent must not show error messages",
        "If data is unavailable, Agent must acknowledge this explicitly",
    ]
)
  1. Use the judge in a scenario that runs multiple times
  2. The error occurs intermittently when the LLM returns criteria as a string

Expected Behavior

The code should handle both cases:

  • When criteria is a dictionary (normal case)
  • When criteria is a JSON string (edge case that needs parsing)

Proposed Fix

Add defensive parsing in scenario/judge_agent.py around line 196:

criteria = args.get("criteria", {})

# Add defensive parsing for string case
if isinstance(criteria, str):
    try:
        criteria = json.loads(criteria)  # Parse if it's a JSON string
    except json.JSONDecodeError:
        criteria = {}  # Fallback to empty dict
elif not isinstance(criteria, dict):
    criteria = {}  # Ensure it's a dict

# Now safely use criteria.values()
for idx, criterion in enumerate(criteria.values()):
    ...

Impact

  • Severity: Medium - Causes test failures but is intermittent
  • Frequency: Intermittent - depends on LLM response format
  • Workaround: None currently available (would require monkey-patching the library)

Additional Context

The schema definition for criteria uses a dictionary comprehension to create dynamic properties:

"criteria": {
    "type": "object",
    "properties": {
        criteria_names[idx]: {
            "type": "string",
            "enum": ["true", "false", "inconclusive"],
            "description": criterion,
        }
        for idx, criterion in enumerate(self.criteria)
    },
    "required": criteria_names,
    "additionalProperties": False,
    "description": "Strict verdict for each criterion",
}

The dynamic property names (sanitized and truncated criterion text) may contribute to the LLM's confusion about the expected format.

Related Code Location

  • File: scenario/judge_agent.py
  • Method: JudgeAgent.call()
  • Lines: ~193-200, ~205

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions