feat: add LLMResponse.reasoning field — separate reasoning tokens from message for thinking-model support

## Problem

\`LLMResponse\` has no field to carry reasoning tokens separately from the substantive response. For thinking models (those with \`ModelCapabilities.thinkingModel: true\` — e.g. GLM-4.7-Flash, DeepSeek-R1, QwQ, and \`@cf/zai-org/glm-5.2\` on the CF catalog), the chain-of-thought trace either:

- Bleeds into \`response.message\`, so consumers receive interleaved reasoning + answer
- Gets stripped somewhere in the provider implementation and is lost entirely

Both outcomes break the \`LLMResponse\` contract. Consumers should receive the substantive answer in \`message\` and — if the model emits reasoning — the reasoning trace in a separate optional field.

### Why this is a library concern, not a consumer concern

The library already acknowledges the problem in \`ModelCapabilities\`:

> True for models that output chain-of-thought reasoning traces as part of their response... These models are unsuitable for direct-response routing (summary, classification, chat) unless the caller **explicitly handles reasoning traces**.

But there is no mechanism for the caller to handle them — \`LLMResponse\` does not expose \`reasoning\`. So consumers are forced to either paper over it with their own stripping logic (wrong layer) or route away from thinking models entirely (unnecessary restriction).

### Current state — per-provider gaps

- **Cloudflare** (`cloudflare.ts:979`): maps \`reasoning_content\` into \`message\` when \`content\` is absent — direct contract violation for \`@cf/zai-org/glm-5.2\`
- **Cerebras** (`cerebras.ts:535`): only reads \`choice.message.content\`; parsed thinking fields are not surfaced
- **Anthropic** (`anthropic.ts:114`): schema only accepts \`text\` and \`tool_use\` blocks — extended-thinking blocks are skipped or schema-sensitive until modeled
- **Groq** (`groq.ts:684`): already preserves \`choice.message.reasoning\` but under \`metadata.reasoning\` — should move or be mirrored to top-level \`reasoning\`
- **Canonical** (`canonical.ts:237`): \`normalizeLLMResponse()\` has no \`reasoning\` pass-through

## Proposed fix

### 1. Add \`reasoning?: string\` to \`LLMResponse\` and \`CanonicalLLMResponse\`

```typescript
export interface LLMResponse {
  id?: string;
  message: string;        // substantive answer only — never contains reasoning tokens
  reasoning?: string;     // chain-of-thought trace, when the model emits it
  // ... rest unchanged
}
```

**Contract**: for any model where \`thinkingModel: true\`, the provider implementation must extract reasoning from the raw completion and place it in \`reasoning\`. \`message\` must contain only the substantive answer.

### 2. Update \`normalizeLLMResponse()\` to preserve \`reasoning\`

Pass \`reasoning\` through the canonical normalization layer so it survives any response-shaping that happens post-provider.

### 3. Add shared extraction helpers

```typescript
// utils/reasoning.ts
export function extractThinkBlock(raw: string): { message: string; reasoning?: string } {
  const match = raw.match(/^<think>([\s\S]*?)<\/think>\s*/i);
  if (!match) return { message: raw };
  return { message: raw.slice(match[0].length), reasoning: match[1].trim() };
}
```

Handles \`<think>...</think>\` blocks emitted by CF glm-5.2 and similar models that don't use native reasoning fields.

### 4. Per-provider fixes (recommended order)

1. **Cloudflare** — fix the \`reasoning_content → message\` fallback path; apply \`extractThinkBlock()\` to completions from \`thinkingModel: true\` catalog entries
2. **Groq** — move \`metadata.reasoning\` to top-level \`response.reasoning\` (data already captured, just in wrong slot)
3. **Cerebras** — map \`choice.message.thinking_content\` → \`reasoning\` when \`reasoning.format: 'parsed'\`
4. **Anthropic** — expand response schema/types to accept \`thinking\` content blocks, then map to \`reasoning\`

Anthropic is the long-pole (schema expansion required before field mapping); the others are straightforward remapping.

### 5. Unit tests with fixture responses

Add focused provider unit tests using captured fixture responses for each known reasoning shape before any live eval work. No cross-repo dependencies needed.

## Acceptance criteria

- [ ] \`reasoning?: string\` added to \`LLMResponse\` and \`CanonicalLLMResponse\` in \`types.ts\`
- [ ] \`normalizeLLMResponse()\` passes \`reasoning\` through
- [ ] \`extractThinkBlock()\` helper in \`utils/reasoning.ts\` with unit tests
- [ ] Cloudflare provider: \`reasoning_content\` no longer falls back into \`message\`; \`<think>\` blocks extracted for \`thinkingModel\` entries
- [ ] Groq provider: \`metadata.reasoning\` mirrored/moved to \`response.reasoning\`
- [ ] Cerebras provider: \`thinking_content\` mapped to \`reasoning\` under \`reasoning.format: 'parsed'\`
- [ ] Anthropic provider: thinking blocks modeled in response schema and mapped to \`reasoning\`
- [ ] CHANGELOG updated — this is an additive public contract change

## Future

Model selection evals (deciding *which* CF Workers AI model to use for which task type) are a natural application of \`@stackbilt/evals\` once this contract is stable — but that's non-blocking follow-up work, not acceptance criteria here.

## Context

Surfaced from \`aegis-daemon\` cheap-llm routing — \`@cf/zai-org/glm-5.2\` is the intended model for the \`patch\` task, but its CoT output currently has no safe landing spot in \`LLMResponse\`. Rather than add stripping logic in the consumer, this belongs at the abstraction layer.

Related: #87 (provider contracts hardening)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add LLMResponse.reasoning field — separate reasoning tokens from message for thinking-model support #95

Problem

Why this is a library concern, not a consumer concern

Current state — per-provider gaps

Proposed fix

1. Add `reasoning?: string` to `LLMResponse` and `CanonicalLLMResponse`

2. Update `normalizeLLMResponse()` to preserve `reasoning`

3. Add shared extraction helpers

4. Per-provider fixes (recommended order)

5. Unit tests with fixture responses

Acceptance criteria

Future

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: add LLMResponse.reasoning field — separate reasoning tokens from message for thinking-model support #95

Description

Problem

Why this is a library concern, not a consumer concern

Current state — per-provider gaps

Proposed fix

1. Add `reasoning?: string` to `LLMResponse` and `CanonicalLLMResponse`

2. Update `normalizeLLMResponse()` to preserve `reasoning`

3. Add shared extraction helpers

4. Per-provider fixes (recommended order)

5. Unit tests with fixture responses

Acceptance criteria

Future

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions