Problem
`LLMResponse` has no field to carry reasoning tokens separately from the substantive response. For thinking models (those with `ModelCapabilities.thinkingModel: true` — e.g. GLM-4.7-Flash, DeepSeek-R1, QwQ, and `@cf/zai-org/glm-5.2` on the CF catalog), the chain-of-thought trace either:
- Bleeds into `response.message`, so consumers receive interleaved reasoning + answer
- Gets stripped somewhere in the provider implementation and is lost entirely
Both outcomes break the `LLMResponse` contract. Consumers should receive the substantive answer in `message` and — if the model emits reasoning — the reasoning trace in a separate optional field.
Why this is a library concern, not a consumer concern
The library already acknowledges the problem in `ModelCapabilities`:
True for models that output chain-of-thought reasoning traces as part of their response... These models are unsuitable for direct-response routing (summary, classification, chat) unless the caller explicitly handles reasoning traces.
But there is no mechanism for the caller to handle them — `LLMResponse` does not expose `reasoning`. So consumers are forced to either paper over it with their own stripping logic (wrong layer) or route away from thinking models entirely (unnecessary restriction).
Current state — per-provider gaps
- Cloudflare (
cloudflare.ts:979): maps `reasoning_content` into `message` when `content` is absent — direct contract violation for `@cf/zai-org/glm-5.2`
- Cerebras (
cerebras.ts:535): only reads `choice.message.content`; parsed thinking fields are not surfaced
- Anthropic (
anthropic.ts:114): schema only accepts `text` and `tool_use` blocks — extended-thinking blocks are skipped or schema-sensitive until modeled
- Groq (
groq.ts:684): already preserves `choice.message.reasoning` but under `metadata.reasoning` — should move or be mirrored to top-level `reasoning`
- Canonical (
canonical.ts:237): `normalizeLLMResponse()` has no `reasoning` pass-through
Proposed fix
1. Add `reasoning?: string` to `LLMResponse` and `CanonicalLLMResponse`
export interface LLMResponse {
id?: string;
message: string; // substantive answer only — never contains reasoning tokens
reasoning?: string; // chain-of-thought trace, when the model emits it
// ... rest unchanged
}
Contract: for any model where `thinkingModel: true`, the provider implementation must extract reasoning from the raw completion and place it in `reasoning`. `message` must contain only the substantive answer.
2. Update `normalizeLLMResponse()` to preserve `reasoning`
Pass `reasoning` through the canonical normalization layer so it survives any response-shaping that happens post-provider.
3. Add shared extraction helpers
// utils/reasoning.ts
export function extractThinkBlock(raw: string): { message: string; reasoning?: string } {
const match = raw.match(/^<think>([\s\S]*?)<\/think>\s*/i);
if (!match) return { message: raw };
return { message: raw.slice(match[0].length), reasoning: match[1].trim() };
}
Handles `...` blocks emitted by CF glm-5.2 and similar models that don't use native reasoning fields.
4. Per-provider fixes (recommended order)
- Cloudflare — fix the `reasoning_content → message` fallback path; apply `extractThinkBlock()` to completions from `thinkingModel: true` catalog entries
- Groq — move `metadata.reasoning` to top-level `response.reasoning` (data already captured, just in wrong slot)
- Cerebras — map `choice.message.thinking_content` → `reasoning` when `reasoning.format: 'parsed'`
- Anthropic — expand response schema/types to accept `thinking` content blocks, then map to `reasoning`
Anthropic is the long-pole (schema expansion required before field mapping); the others are straightforward remapping.
5. Unit tests with fixture responses
Add focused provider unit tests using captured fixture responses for each known reasoning shape before any live eval work. No cross-repo dependencies needed.
Acceptance criteria
Future
Model selection evals (deciding which CF Workers AI model to use for which task type) are a natural application of `@stackbilt/evals` once this contract is stable — but that's non-blocking follow-up work, not acceptance criteria here.
Context
Surfaced from `aegis-daemon` cheap-llm routing — `@cf/zai-org/glm-5.2` is the intended model for the `patch` task, but its CoT output currently has no safe landing spot in `LLMResponse`. Rather than add stripping logic in the consumer, this belongs at the abstraction layer.
Related: #87 (provider contracts hardening)
Problem
`LLMResponse` has no field to carry reasoning tokens separately from the substantive response. For thinking models (those with `ModelCapabilities.thinkingModel: true` — e.g. GLM-4.7-Flash, DeepSeek-R1, QwQ, and `@cf/zai-org/glm-5.2` on the CF catalog), the chain-of-thought trace either:
Both outcomes break the `LLMResponse` contract. Consumers should receive the substantive answer in `message` and — if the model emits reasoning — the reasoning trace in a separate optional field.
Why this is a library concern, not a consumer concern
The library already acknowledges the problem in `ModelCapabilities`:
But there is no mechanism for the caller to handle them — `LLMResponse` does not expose `reasoning`. So consumers are forced to either paper over it with their own stripping logic (wrong layer) or route away from thinking models entirely (unnecessary restriction).
Current state — per-provider gaps
cloudflare.ts:979): maps `reasoning_content` into `message` when `content` is absent — direct contract violation for `@cf/zai-org/glm-5.2`cerebras.ts:535): only reads `choice.message.content`; parsed thinking fields are not surfacedanthropic.ts:114): schema only accepts `text` and `tool_use` blocks — extended-thinking blocks are skipped or schema-sensitive until modeledgroq.ts:684): already preserves `choice.message.reasoning` but under `metadata.reasoning` — should move or be mirrored to top-level `reasoning`canonical.ts:237): `normalizeLLMResponse()` has no `reasoning` pass-throughProposed fix
1. Add `reasoning?: string` to `LLMResponse` and `CanonicalLLMResponse`
Contract: for any model where `thinkingModel: true`, the provider implementation must extract reasoning from the raw completion and place it in `reasoning`. `message` must contain only the substantive answer.
2. Update `normalizeLLMResponse()` to preserve `reasoning`
Pass `reasoning` through the canonical normalization layer so it survives any response-shaping that happens post-provider.
3. Add shared extraction helpers
Handles `...` blocks emitted by CF glm-5.2 and similar models that don't use native reasoning fields.
4. Per-provider fixes (recommended order)
Anthropic is the long-pole (schema expansion required before field mapping); the others are straightforward remapping.
5. Unit tests with fixture responses
Add focused provider unit tests using captured fixture responses for each known reasoning shape before any live eval work. No cross-repo dependencies needed.
Acceptance criteria
Future
Model selection evals (deciding which CF Workers AI model to use for which task type) are a natural application of `@stackbilt/evals` once this contract is stable — but that's non-blocking follow-up work, not acceptance criteria here.
Context
Surfaced from `aegis-daemon` cheap-llm routing — `@cf/zai-org/glm-5.2` is the intended model for the `patch` task, but its CoT output currently has no safe landing spot in `LLMResponse`. Rather than add stripping logic in the consumer, this belongs at the abstraction layer.
Related: #87 (provider contracts hardening)