Describe the Bug:
When defining an Agent and including input_schema in order to serialize a Pydantic object non-latin characters are being escaped to unicode format. This behavior causes token bloating, slow LLM responses and in some cases LLM is not able to return a readable response and returns symbols blended with the origin language of the prompt. In my case, for a RAG solution I sent context in Hebrew to an Agent and each letter was replaced with 6 Unicode ascii letters which led to high number of prompt tokens and inconsistency with LLM response.
Steps to Reproduce:
pip install google-adk==2.2.0 google-genai==2.8.0
- Sample of python code causing the issue (I added before_model_callback to verify that sent data is in Unicode). Any Agent using input_schema will reproduce that. adding some code to reference
class SearchAttempt(BaseModel):
query: str
surrounding_chunks: int
result_count: int
error_message: str | None = None
class NormalizedChunk(BaseModel):
retrieval_index: int
document_bucket: str
document_file_name: str
chunk_index: int
distance: float | None = None
title: str | None = None
publication_date: str | None = None
section_title: str | None = None
section_summary: str | None = None
text: str | None = None
class SearchResult(BaseModel):
status: Literal["success", "error"]
query: str
surrounding_chunks: int
normalized_chunks: list[NormalizedChunk] = Field(default_factory=list)
error_message: str | None = None
results_count: int
class EvaluationInput(BaseModel):
original_query: str
search_result: SearchResult
attempts: list[SearchAttempt]
def debug_rag_evaluator_request(
callback_context: CallbackContext,
llm_request: LlmRequest,
) -> None:
print("========== raw parts ==========")
for content in llm_request.contents or []:
for part in content.parts or []:
if part.text:
print(part.text[:5000])
print("========== end raw parts ==========")
rag_evaluator_agent = Agent(
name="rag_evaluator_agent",
model=_agent_model(),
input_schema=EvaluationInput,
output_schema=RetrievalDecision,
include_contents="none",
before_model_callback=debug_rag_evaluator_request,
generate_content_config=types.GenerateContentConfig(
responseMimeType="application/json",
temperature=0.0,
max_output_tokens=8192
),
instruction="""
......
""",
)
Expected Behavior:
It seems that for response_schema this issue was fixed on #2936 by putting ensure_ascii=False for output schema dumps logic. The same should be done for input schema or users should at least have the option to choose whether to use that flag.
The bad behavior arises from: _node_input_to_content which calls json.dumps and model_dump_json without ensure_ascii=False flag. I ended up monkey-patching the function in my service which causes ADK to send request with Hebrew letters properly but of course this is discouraged.
Observed Behavior:
Request for LLM becomes bloated (~6 times more tokens than with non escaped letters). LLM responses were very slow and not predictable (responses were huge with many repeating symbols, signs were blended with hebrew letters, etc).
Environment Details:
- ADK Library Version (pip show google-adk): 2.2.0
- Desktop OS:** [e.g., macOS, Linux, Windows]: Linux (WSL)
- Python Version (python -V): 3.12
Model Information:
- Are you using LiteLLM: No
- Which model is being used: gemini-2.5-flash
How often has this issue occurred?:
Describe the Bug:
When defining an Agent and including input_schema in order to serialize a Pydantic object non-latin characters are being escaped to unicode format. This behavior causes token bloating, slow LLM responses and in some cases LLM is not able to return a readable response and returns symbols blended with the origin language of the prompt. In my case, for a RAG solution I sent context in Hebrew to an Agent and each letter was replaced with 6 Unicode ascii letters which led to high number of prompt tokens and inconsistency with LLM response.
Steps to Reproduce:
pip install google-adk==2.2.0 google-genai==2.8.0Expected Behavior:
It seems that for
response_schemathis issue was fixed on #2936 by putting ensure_ascii=False for output schema dumps logic. The same should be done for input schema or users should at least have the option to choose whether to use that flag.The bad behavior arises from: _node_input_to_content which calls json.dumps and model_dump_json without ensure_ascii=False flag. I ended up monkey-patching the function in my service which causes ADK to send request with Hebrew letters properly but of course this is discouraged.
Observed Behavior:
Request for LLM becomes bloated (~6 times more tokens than with non escaped letters). LLM responses were very slow and not predictable (responses were huge with many repeating symbols, signs were blended with hebrew letters, etc).
Environment Details:
Model Information:
How often has this issue occurred?: