Skip to content

feat: add LLMResponse.reasoning field — separate reasoning tokens from message for thinking-model support #95

Description

@stackbilt-admin

Problem

`LLMResponse` has no field to carry reasoning tokens separately from the substantive response. For thinking models (those with `ModelCapabilities.thinkingModel: true` — e.g. GLM-4.7-Flash, DeepSeek-R1, QwQ, and `@cf/zai-org/glm-5.2` on the CF catalog), the chain-of-thought trace either:

  • Bleeds into `response.message`, so consumers receive interleaved reasoning + answer
  • Gets stripped somewhere in the provider implementation and is lost entirely

Both outcomes break the `LLMResponse` contract. Consumers should receive the substantive answer in `message` and — if the model emits reasoning — the reasoning trace in a separate optional field.

Why this is a library concern, not a consumer concern

The library already acknowledges the problem in `ModelCapabilities`:

True for models that output chain-of-thought reasoning traces as part of their response... These models are unsuitable for direct-response routing (summary, classification, chat) unless the caller explicitly handles reasoning traces.

But there is no mechanism for the caller to handle them — `LLMResponse` does not expose `reasoning`. So consumers are forced to either paper over it with their own stripping logic (wrong layer) or route away from thinking models entirely (unnecessary restriction).

Current state — per-provider gaps

  • Cloudflare (cloudflare.ts:979): maps `reasoning_content` into `message` when `content` is absent — direct contract violation for `@cf/zai-org/glm-5.2`
  • Cerebras (cerebras.ts:535): only reads `choice.message.content`; parsed thinking fields are not surfaced
  • Anthropic (anthropic.ts:114): schema only accepts `text` and `tool_use` blocks — extended-thinking blocks are skipped or schema-sensitive until modeled
  • Groq (groq.ts:684): already preserves `choice.message.reasoning` but under `metadata.reasoning` — should move or be mirrored to top-level `reasoning`
  • Canonical (canonical.ts:237): `normalizeLLMResponse()` has no `reasoning` pass-through

Proposed fix

1. Add `reasoning?: string` to `LLMResponse` and `CanonicalLLMResponse`

export interface LLMResponse {
  id?: string;
  message: string;        // substantive answer only — never contains reasoning tokens
  reasoning?: string;     // chain-of-thought trace, when the model emits it
  // ... rest unchanged
}

Contract: for any model where `thinkingModel: true`, the provider implementation must extract reasoning from the raw completion and place it in `reasoning`. `message` must contain only the substantive answer.

2. Update `normalizeLLMResponse()` to preserve `reasoning`

Pass `reasoning` through the canonical normalization layer so it survives any response-shaping that happens post-provider.

3. Add shared extraction helpers

// utils/reasoning.ts
export function extractThinkBlock(raw: string): { message: string; reasoning?: string } {
  const match = raw.match(/^<think>([\s\S]*?)<\/think>\s*/i);
  if (!match) return { message: raw };
  return { message: raw.slice(match[0].length), reasoning: match[1].trim() };
}

Handles `...` blocks emitted by CF glm-5.2 and similar models that don't use native reasoning fields.

4. Per-provider fixes (recommended order)

  1. Cloudflare — fix the `reasoning_content → message` fallback path; apply `extractThinkBlock()` to completions from `thinkingModel: true` catalog entries
  2. Groq — move `metadata.reasoning` to top-level `response.reasoning` (data already captured, just in wrong slot)
  3. Cerebras — map `choice.message.thinking_content` → `reasoning` when `reasoning.format: 'parsed'`
  4. Anthropic — expand response schema/types to accept `thinking` content blocks, then map to `reasoning`

Anthropic is the long-pole (schema expansion required before field mapping); the others are straightforward remapping.

5. Unit tests with fixture responses

Add focused provider unit tests using captured fixture responses for each known reasoning shape before any live eval work. No cross-repo dependencies needed.

Acceptance criteria

  • `reasoning?: string` added to `LLMResponse` and `CanonicalLLMResponse` in `types.ts`
  • `normalizeLLMResponse()` passes `reasoning` through
  • `extractThinkBlock()` helper in `utils/reasoning.ts` with unit tests
  • Cloudflare provider: `reasoning_content` no longer falls back into `message`; `` blocks extracted for `thinkingModel` entries
  • Groq provider: `metadata.reasoning` mirrored/moved to `response.reasoning`
  • Cerebras provider: `thinking_content` mapped to `reasoning` under `reasoning.format: 'parsed'`
  • Anthropic provider: thinking blocks modeled in response schema and mapped to `reasoning`
  • CHANGELOG updated — this is an additive public contract change

Future

Model selection evals (deciding which CF Workers AI model to use for which task type) are a natural application of `@stackbilt/evals` once this contract is stable — but that's non-blocking follow-up work, not acceptance criteria here.

Context

Surfaced from `aegis-daemon` cheap-llm routing — `@cf/zai-org/glm-5.2` is the intended model for the `patch` task, but its CoT output currently has no safe landing spot in `LLMResponse`. Rather than add stripping logic in the consumer, this belongs at the abstraction layer.

Related: #87 (provider contracts hardening)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions