Description
When routing requests to Anthropic Claude models on Vertex AI through Portkey, cache_control blocks in message content are not preserved/forwarded to the Vertex AI endpoint. This means Anthropic's prompt caching feature does not work when using Portkey as a gateway to Vertex AI.
Steps to Reproduce
1. Send request via Portkey /v1/messages with cache_control:
curl -X POST "https://api.portkey.ai/v1/messages" \
-H "x-portkey-api-key: $PORTKEY_API_KEY" \
-H "x-api-key: dummy" \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "@my-vertex-provider/anthropic.claude-sonnet-4-6",
"max_tokens": 50,
"system": [{
"type": "text",
"text": "<large text content over 1024 tokens>",
"cache_control": {"type": "ephemeral"}
}],
"messages": [{"role": "user", "content": "hello"}]
}'
2. Response shows zero cache tokens:
{
"usage": {
"input_tokens": 1060,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 0
}
}
}
3. Same test via /v1/chat/completions also shows zero cached tokens:
{
"usage": {
"prompt_tokens": 1060,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}
Expected Behavior
cache_control blocks should be forwarded to the Vertex AI Anthropic endpoint, and the response should show cache_creation_input_tokens > 0 on the first call and cache_read_input_tokens > 0 on subsequent calls with the same cached prefix.
Confirmed Working Without Portkey
When sending the same request directly to Vertex AI using @anthropic-ai/vertex-sdk (bypassing Portkey), prompt caching works correctly — showing 139K+ cached read tokens on subsequent calls.
Environment
- Portkey hosted gateway (
api.portkey.ai)
- Provider: Vertex AI (Anthropic Claude models,
global region)
- Models tested:
anthropic.claude-sonnet-4-6, anthropic.claude-opus-4-6
- Both
/v1/messages and /v1/chat/completions endpoints tested
- Content exceeds the 1024-token minimum required for caching
Related
Impact
Users who route Anthropic Vertex AI requests through Portkey cannot use prompt caching, which significantly increases costs and latency for applications with large system prompts or conversation contexts.
Description
When routing requests to Anthropic Claude models on Vertex AI through Portkey,
cache_controlblocks in message content are not preserved/forwarded to the Vertex AI endpoint. This means Anthropic's prompt caching feature does not work when using Portkey as a gateway to Vertex AI.Steps to Reproduce
1. Send request via Portkey
/v1/messageswithcache_control:2. Response shows zero cache tokens:
{ "usage": { "input_tokens": 1060, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0, "cache_creation": { "ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 0 } } }3. Same test via
/v1/chat/completionsalso shows zero cached tokens:{ "usage": { "prompt_tokens": 1060, "prompt_tokens_details": { "cached_tokens": 0 } } }Expected Behavior
cache_controlblocks should be forwarded to the Vertex AI Anthropic endpoint, and the response should showcache_creation_input_tokens > 0on the first call andcache_read_input_tokens > 0on subsequent calls with the same cached prefix.Confirmed Working Without Portkey
When sending the same request directly to Vertex AI using
@anthropic-ai/vertex-sdk(bypassing Portkey), prompt caching works correctly — showing 139K+ cached read tokens on subsequent calls.Environment
api.portkey.ai)globalregion)anthropic.claude-sonnet-4-6,anthropic.claude-opus-4-6/v1/messagesand/v1/chat/completionsendpoints testedRelated
Impact
Users who route Anthropic Vertex AI requests through Portkey cannot use prompt caching, which significantly increases costs and latency for applications with large system prompts or conversation contexts.