Skip to content

[Feature]: Support cache_control_injection_points for tools location #21969

@vishwanath-gowda

Description

@vishwanath-gowda

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Summary

Add support for a "location": "tool_config" option in cache_control_injection_points to enable automatic injection of Bedrock Converse API cachePoint markers into the toolConfig.tools array. This would allow users to cache tool definitions alongside system prompts, which is supported by Bedrock but not currently accessible through LiteLLM's auto-injection mechanism.

Motivation

AWS Bedrock's Converse API supports prompt caching on three fields for Claude models: system, messages, and tools (docs). The current cache_control_injection_points feature only supports injecting cache_control into messages (by role or index). There is no way to inject a cachePoint into the toolConfig section of the request.

For agentic applications with many tools (e.g., 5+ MCP servers with 20+ tool definitions), the tool definitions represent a significant portion of the input tokens that are static across requests. Being able to cache these alongside the system prompt would provide substantial cost and latency savings.

Current Behavior

response = completion(
    model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=[...],
    tools=[...],  # 20+ tool definitions
    cache_control_injection_points=[
        {"location": "message", "role": "system"},
        # No way to target tools
    ],
)

LiteLLM injects cache_control into the system message but the tool definitions in toolConfig.tools are sent without any cachePoint, so they are reprocessed on every request.

Proposed Behavior

response = completion(
    model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
    messages=[...],
    tools=[...],
    cache_control_injection_points=[
        {"location": "message", "role": "system"},
        {"location": "tool_config"},  # NEW: inject cachePoint after last tool
    ],
)

LiteLLM would append a cachePoint entry to the end of the toolConfig.tools array in the Bedrock Converse API request:

{
  "toolConfig": {
    "tools": [
      {"toolSpec": {"name": "tool_1", ...}},
      {"toolSpec": {"name": "tool_2", ...}},
      {"cachePoint": {"type": "default"}}
    ]
  }
}

This follows the same pattern Bedrock uses for system and message cache checkpoints, as documented in the Bedrock prompt caching guide.

Supported Models

Per the Bedrock docs, the following models support tools as a cache checkpoint field:

Model Tools Caching
Claude 3.7 Sonnet Yes
Claude 3.5 Sonnet v2 Yes
Claude 3.5 Haiku Yes
Claude Sonnet 4 Yes
Claude Opus 4 Yes
Amazon Nova models No (only system and messages)

Implementation Notes

The injection logic would be similar to the existing message injection in litellm/litellm_core_utils/prompt_templates/:

  1. When {"location": "tool_config"} is present in cache_control_injection_points
  2. And the request includes tools (non-empty)
  3. After LiteLLM translates OpenAI-format tools to Bedrock's toolConfig.tools format
  4. Append {"cachePoint": {"type": "default"}} as the last entry in the tools array

For the Anthropic direct API (non-Bedrock), the equivalent would be adding cache_control: {"type": "ephemeral"} to the last tool definition, following Anthropic's API spec.

Use Case

We run an agentic application (OpenAI Agents SDK + LiteLLM + Bedrock) with 5 MCP servers providing 20+ tools. The system prompt is ~4K tokens and tool definitions add another ~3K tokens. Currently we can only cache the system prompt via cache_control_injection_points. Being able to cache tools would roughly double the cached prefix size, further reducing per-request cost and latency.

Related Issues

Environment

  • LiteLLM version: 1.81.9
  • Provider: AWS Bedrock (Converse API)
  • Model: us.anthropic.claude-3-7-sonnet-20250219-v1:0

Motivation, pitch

I'm building an agentic application using LiteLLM with AWS Bedrock (Claude 3.7 Sonnet) that has 20+ tool definitions from multiple MCP servers. I'm using cache_control_injection_points with {"location": "message", "role": "system"} to cache the system prompt, which works well.

However, Bedrock's Converse API supports prompt caching on system, messages, AND tools for Claude models (AWS docs). There's currently no way to use cache_control_injection_points to inject a cachePoint into the toolConfig.tools array.

For tool-heavy agentic workloads, the tool definitions are a significant chunk of static input tokens that get reprocessed on every request. Being able to cache them alongside the system prompt would roughly double the cached prefix, further reducing cost and latency.

Related: #10226 (closed — addressed negative index support, not tool caching)

What part of LiteLLM is this about?

SDK (litellm Python package)

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions