cache_control blocks stripped when routing to Vertex AI Anthropic models

## Description

When routing requests to Anthropic Claude models on Vertex AI through Portkey, `cache_control` blocks in message content are not preserved/forwarded to the Vertex AI endpoint. This means Anthropic's prompt caching feature does not work when using Portkey as a gateway to Vertex AI.

## Steps to Reproduce

### 1. Send request via Portkey `/v1/messages` with `cache_control`:

```bash
curl -X POST "https://api.portkey.ai/v1/messages" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -H "x-api-key: dummy" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "@my-vertex-provider/anthropic.claude-sonnet-4-6",
    "max_tokens": 50,
    "system": [{
      "type": "text",
      "text": "<large text content over 1024 tokens>",
      "cache_control": {"type": "ephemeral"}
    }],
    "messages": [{"role": "user", "content": "hello"}]
  }'
```

### 2. Response shows zero cache tokens:

```json
{
  "usage": {
    "input_tokens": 1060,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    }
  }
}
```

### 3. Same test via `/v1/chat/completions` also shows zero cached tokens:

```json
{
  "usage": {
    "prompt_tokens": 1060,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}
```

## Expected Behavior

`cache_control` blocks should be forwarded to the Vertex AI Anthropic endpoint, and the response should show `cache_creation_input_tokens > 0` on the first call and `cache_read_input_tokens > 0` on subsequent calls with the same cached prefix.

## Confirmed Working Without Portkey

When sending the same request directly to Vertex AI using `@anthropic-ai/vertex-sdk` (bypassing Portkey), prompt caching works correctly — showing 139K+ cached read tokens on subsequent calls.

## Environment

- Portkey hosted gateway (`api.portkey.ai`)
- Provider: Vertex AI (Anthropic Claude models, `global` region)
- Models tested: `anthropic.claude-sonnet-4-6`, `anthropic.claude-opus-4-6`
- Both `/v1/messages` and `/v1/chat/completions` endpoints tested
- Content exceeds the 1024-token minimum required for caching

## Related

- Possibly related to #1345 (Anthropic through Vertex does not forward header properly)
- Portkey docs claim cache_control support: https://portkey.ai/docs/integrations/llms/anthropic/prompt-caching

## Impact

Users who route Anthropic Vertex AI requests through Portkey cannot use prompt caching, which significantly increases costs and latency for applications with large system prompts or conversation contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache_control blocks stripped when routing to Vertex AI Anthropic models #1579

Description

Steps to Reproduce

1. Send request via Portkey `/v1/messages` with `cache_control`:

2. Response shows zero cache tokens:

3. Same test via `/v1/chat/completions` also shows zero cached tokens:

Expected Behavior

Confirmed Working Without Portkey

Environment

Related

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cache_control blocks stripped when routing to Vertex AI Anthropic models #1579

Description

Description

Steps to Reproduce

1. Send request via Portkey /v1/messages with cache_control:

2. Response shows zero cache tokens:

3. Same test via /v1/chat/completions also shows zero cached tokens:

Expected Behavior

Confirmed Working Without Portkey

Environment

Related

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Send request via Portkey `/v1/messages` with `cache_control`:

3. Same test via `/v1/chat/completions` also shows zero cached tokens: