-
-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
Check for existing issues
- I have searched the existing issues and checked that my issue is not a duplicate.
The Feature
Summary
Add support for a "location": "tool_config" option in cache_control_injection_points to enable automatic injection of Bedrock Converse API cachePoint markers into the toolConfig.tools array. This would allow users to cache tool definitions alongside system prompts, which is supported by Bedrock but not currently accessible through LiteLLM's auto-injection mechanism.
Motivation
AWS Bedrock's Converse API supports prompt caching on three fields for Claude models: system, messages, and tools (docs). The current cache_control_injection_points feature only supports injecting cache_control into messages (by role or index). There is no way to inject a cachePoint into the toolConfig section of the request.
For agentic applications with many tools (e.g., 5+ MCP servers with 20+ tool definitions), the tool definitions represent a significant portion of the input tokens that are static across requests. Being able to cache these alongside the system prompt would provide substantial cost and latency savings.
Current Behavior
response = completion(
model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=[...],
tools=[...], # 20+ tool definitions
cache_control_injection_points=[
{"location": "message", "role": "system"},
# No way to target tools
],
)LiteLLM injects cache_control into the system message but the tool definitions in toolConfig.tools are sent without any cachePoint, so they are reprocessed on every request.
Proposed Behavior
response = completion(
model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=[...],
tools=[...],
cache_control_injection_points=[
{"location": "message", "role": "system"},
{"location": "tool_config"}, # NEW: inject cachePoint after last tool
],
)LiteLLM would append a cachePoint entry to the end of the toolConfig.tools array in the Bedrock Converse API request:
{
"toolConfig": {
"tools": [
{"toolSpec": {"name": "tool_1", ...}},
{"toolSpec": {"name": "tool_2", ...}},
{"cachePoint": {"type": "default"}}
]
}
}This follows the same pattern Bedrock uses for system and message cache checkpoints, as documented in the Bedrock prompt caching guide.
Supported Models
Per the Bedrock docs, the following models support tools as a cache checkpoint field:
| Model | Tools Caching |
|---|---|
| Claude 3.7 Sonnet | Yes |
| Claude 3.5 Sonnet v2 | Yes |
| Claude 3.5 Haiku | Yes |
| Claude Sonnet 4 | Yes |
| Claude Opus 4 | Yes |
| Amazon Nova models | No (only system and messages) |
Implementation Notes
The injection logic would be similar to the existing message injection in litellm/litellm_core_utils/prompt_templates/:
- When
{"location": "tool_config"}is present incache_control_injection_points - And the request includes
tools(non-empty) - After LiteLLM translates OpenAI-format tools to Bedrock's
toolConfig.toolsformat - Append
{"cachePoint": {"type": "default"}}as the last entry in thetoolsarray
For the Anthropic direct API (non-Bedrock), the equivalent would be adding cache_control: {"type": "ephemeral"} to the last tool definition, following Anthropic's API spec.
Use Case
We run an agentic application (OpenAI Agents SDK + LiteLLM + Bedrock) with 5 MCP servers providing 20+ tools. The system prompt is ~4K tokens and tool definitions add another ~3K tokens. Currently we can only cache the system prompt via cache_control_injection_points. Being able to cache tools would roughly double the cached prefix size, further reducing per-request cost and latency.
Related Issues
- [Bug]: Cache control injection points for Anthropic/Bedrock #10226 — Cache control injection points for Anthropic/Bedrock (general improvements)
- [Bug: Anthropic Bedrock Converse]: assistant & tool messages dropping cache points #12695 — Assistant & tool messages dropping cache points
Environment
- LiteLLM version: 1.81.9
- Provider: AWS Bedrock (Converse API)
- Model:
us.anthropic.claude-3-7-sonnet-20250219-v1:0
Motivation, pitch
I'm building an agentic application using LiteLLM with AWS Bedrock (Claude 3.7 Sonnet) that has 20+ tool definitions from multiple MCP servers. I'm using cache_control_injection_points with {"location": "message", "role": "system"} to cache the system prompt, which works well.
However, Bedrock's Converse API supports prompt caching on system, messages, AND tools for Claude models (AWS docs). There's currently no way to use cache_control_injection_points to inject a cachePoint into the toolConfig.tools array.
For tool-heavy agentic workloads, the tool definitions are a significant chunk of static input tokens that get reprocessed on every request. Being able to cache them alongside the system prompt would roughly double the cached prefix, further reducing cost and latency.
Related: #10226 (closed — addressed negative index support, not tool caching)
What part of LiteLLM is this about?
SDK (litellm Python package)
LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?
No
Twitter / LinkedIn details
No response