feat(vllm-serve): add --reasoning-parser and --reasoning-config flags by kfirah-create · Pull Request #5672 · huggingface/trl

kfirah-create · 2026-04-28T11:29:49Z

What does this PR do?

trl vllm-serve currently doesn't expose vLLM's --reasoning-parser or --reasoning-config options. When using thinking models (Qwen3.5, DeepSeek-R1, etc.) with GRPO and thinking_token_budget in generation_kwargs, the vLLM engine returns a 400 error:

thinking_token_budget is set but reasoning_config is not configured. Please set --reasoning-config to use thinking_token_budget.

This is because vLLM requires a ReasoningConfig on the engine to activate its ThinkingTokenBudgetLogitsProcessor, but trl vllm-serve has no way to set it.

This PR adds two new ScriptArguments fields in vllm_serve.py:

reasoning_parser — e.g., "qwen3", "deepseek_r1" — forwarded to LLM(reasoning_parser=...)
reasoning_config — JSON string forwarded to LLM(reasoning_config=...) for explicit delimiter control

Both are passed through to the LLM() constructor, enabling thinking_token_budget in generation_kwargs to work end-to-end.

Example usage:

trl vllm-serve \
    --model Qwen/Qwen3.5-9B \
    --reasoning_parser qwen3

GRPOConfig(
    ...,
    generation_kwargs={"thinking_token_budget": 1024},
)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec — this enables thinking_token_budget support for thinking models (Qwen3.5, DeepSeek-R1) when using trl vllm-serve with GRPO.

Note

Medium Risk
Adds new CLI arguments that are forwarded into vLLM engine initialization; risk is mainly version/compatibility issues if installed vLLM lacks these constructor params or if invalid JSON is provided.

Overview
trl vllm-serve now supports thinking-model reasoning setup by adding --reasoning_parser and --reasoning_config options to ScriptArguments.

These values are forwarded into the vLLM LLM(...) constructor (with reasoning_config parsed from JSON), enabling thinking_token_budget in downstream generation_kwargs when serving models like Qwen/DeepSeek.

^{Reviewed by Cursor Bugbot for commit a1a1272. Bugbot is set up for automated code reviews on this repo. Configure here.}

`trl vllm-serve` currently doesn't expose vLLM's reasoning config options. This means `thinking_token_budget` passed via `GRPOConfig.generation_kwargs` fails with a 400 error: "thinking_token_budget is set but reasoning_config is not configured" because the vLLM engine has no `ReasoningConfig` set. This adds `--reasoning_parser` and `--reasoning_config` CLI flags to `ScriptArguments` in `vllm_serve.py` and passes them through to the `LLM()` constructor. This enables thinking-aware models (Qwen3.5, DeepSeek-R1, etc.) to use `thinking_token_budget` during GRPO training without needing to launch vLLM manually outside of TRL. Example usage: trl vllm-serve \ --model Qwen/Qwen3.5-9B \ --reasoning_parser qwen3 Then in the training script: GRPOConfig( ..., generation_kwargs={"thinking_token_budget": 1024}, ) Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm-serve): add --reasoning-parser and --reasoning-config flags#5672

feat(vllm-serve): add --reasoning-parser and --reasoning-config flags#5672
kfirah-create wants to merge 1 commit intohuggingface:mainfrom
kfirah-create:feat/vllm-serve-reasoning-config

kfirah-create commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kfirah-create commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfirah-create commented Apr 28, 2026 •

edited by cursor Bot

Loading