Skip to content

fix(workflow): preserve non-ASCII characters in LLM agent node input#6282

Open
anneheartrecord wants to merge 1 commit into
google:mainfrom
anneheartrecord:fix/input-schema-non-ascii
Open

fix(workflow): preserve non-ASCII characters in LLM agent node input#6282
anneheartrecord wants to merge 1 commit into
google:mainfrom
anneheartrecord:fix/input-schema-non-ascii

Conversation

@anneheartrecord

Copy link
Copy Markdown

Summary

input_schema inputs containing non-Latin characters (Hebrew, Chinese, etc.) reach the LLM as \uXXXX escapes, which bloats prompt tokens (~6x for Hebrew) and degrades model responses, as reported in #6279.

The escaping comes from json.dumps being called with its default ensure_ascii=True on the LLM-bound input text. Two paths were affected:

  • workflow/_llm_agent_wrapper.py_node_input_to_content() for dict/list node input.
  • flows/llm_flows/contents.py_build_task_input_user_content(), which rebuilds a delegated task's function-call args as the sub-agent's first user turn. This path takes priority over the wrapper fallback, so it affects the common chat/root → task sub-agent delegation case.

Both now serialize with ensure_ascii=False, matching how the output-schema path already serializes responses (_output_schema_processor.py), fixed earlier in #2936/#2937.

Note: the BaseModel branch (model_dump_json()) is intentionally left unchanged — Pydantic v2 already emits raw UTF-8 (does not escape non-ASCII), which the added regression test confirms.

Testing

  • tests/unittests/workflow/test_llm_agent_as_node.pydict/list/BaseModel node input preserves non-ASCII.
  • tests/unittests/flows/llm_flows/test_contents.py – delegated task FC args preserve non-ASCII.

Both new tests fail before the change and pass after.

Fixes #6279

Node input passed to an LLM agent (workflow node input and delegated-task
function-call args) was serialized with json.dumps' default ensure_ascii=True,
escaping non-Latin characters to \uXXXX. This bloats prompt tokens and
degrades model responses for non-English inputs. Serialize with
ensure_ascii=False so characters reach the model as-is, matching how the
output-schema path already serializes responses.
@google-cla

google-cla Bot commented Jul 3, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input Schema: Non ASCII characters are being escaped to Unicode

1 participant