Skip to content

cache_control blocks stripped when routing to Vertex AI Anthropic models #1579

@elmanGSB

Description

@elmanGSB

Description

When routing requests to Anthropic Claude models on Vertex AI through Portkey, cache_control blocks in message content are not preserved/forwarded to the Vertex AI endpoint. This means Anthropic's prompt caching feature does not work when using Portkey as a gateway to Vertex AI.

Steps to Reproduce

1. Send request via Portkey /v1/messages with cache_control:

curl -X POST "https://api.portkey.ai/v1/messages" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -H "x-api-key: dummy" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "@my-vertex-provider/anthropic.claude-sonnet-4-6",
    "max_tokens": 50,
    "system": [{
      "type": "text",
      "text": "<large text content over 1024 tokens>",
      "cache_control": {"type": "ephemeral"}
    }],
    "messages": [{"role": "user", "content": "hello"}]
  }'

2. Response shows zero cache tokens:

{
  "usage": {
    "input_tokens": 1060,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    }
  }
}

3. Same test via /v1/chat/completions also shows zero cached tokens:

{
  "usage": {
    "prompt_tokens": 1060,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

Expected Behavior

cache_control blocks should be forwarded to the Vertex AI Anthropic endpoint, and the response should show cache_creation_input_tokens > 0 on the first call and cache_read_input_tokens > 0 on subsequent calls with the same cached prefix.

Confirmed Working Without Portkey

When sending the same request directly to Vertex AI using @anthropic-ai/vertex-sdk (bypassing Portkey), prompt caching works correctly — showing 139K+ cached read tokens on subsequent calls.

Environment

  • Portkey hosted gateway (api.portkey.ai)
  • Provider: Vertex AI (Anthropic Claude models, global region)
  • Models tested: anthropic.claude-sonnet-4-6, anthropic.claude-opus-4-6
  • Both /v1/messages and /v1/chat/completions endpoints tested
  • Content exceeds the 1024-token minimum required for caching

Related

Impact

Users who route Anthropic Vertex AI requests through Portkey cannot use prompt caching, which significantly increases costs and latency for applications with large system prompts or conversation contexts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions