Skip to content

feat(vllm-serve): add --reasoning-parser and --reasoning-config flags#5672

Open
kfirah-create wants to merge 1 commit intohuggingface:mainfrom
kfirah-create:feat/vllm-serve-reasoning-config
Open

feat(vllm-serve): add --reasoning-parser and --reasoning-config flags#5672
kfirah-create wants to merge 1 commit intohuggingface:mainfrom
kfirah-create:feat/vllm-serve-reasoning-config

Conversation

@kfirah-create
Copy link
Copy Markdown

@kfirah-create kfirah-create commented Apr 28, 2026

What does this PR do?

trl vllm-serve currently doesn't expose vLLM's --reasoning-parser or --reasoning-config options. When using thinking models (Qwen3.5, DeepSeek-R1, etc.) with GRPO and thinking_token_budget in generation_kwargs, the vLLM engine returns a 400 error:

thinking_token_budget is set but reasoning_config is not configured. Please set --reasoning-config to use thinking_token_budget.

This is because vLLM requires a ReasoningConfig on the engine to activate its ThinkingTokenBudgetLogitsProcessor, but trl vllm-serve has no way to set it.

This PR adds two new ScriptArguments fields in vllm_serve.py:

  • reasoning_parser — e.g., "qwen3", "deepseek_r1" — forwarded to LLM(reasoning_parser=...)
  • reasoning_config — JSON string forwarded to LLM(reasoning_config=...) for explicit delimiter control

Both are passed through to the LLM() constructor, enabling thinking_token_budget in generation_kwargs to work end-to-end.

Example usage:

trl vllm-serve \
    --model Qwen/Qwen3.5-9B \
    --reasoning_parser qwen3
GRPOConfig(
    ...,
    generation_kwargs={"thinking_token_budget": 1024},
)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

  • No AI usage: the PR was written entirely by a human.
  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
  • AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

@qgallouedec — this enables thinking_token_budget support for thinking models (Qwen3.5, DeepSeek-R1) when using trl vllm-serve with GRPO.


Note

Medium Risk
Adds new CLI arguments that are forwarded into vLLM engine initialization; risk is mainly version/compatibility issues if installed vLLM lacks these constructor params or if invalid JSON is provided.

Overview
trl vllm-serve now supports thinking-model reasoning setup by adding --reasoning_parser and --reasoning_config options to ScriptArguments.

These values are forwarded into the vLLM LLM(...) constructor (with reasoning_config parsed from JSON), enabling thinking_token_budget in downstream generation_kwargs when serving models like Qwen/DeepSeek.

Reviewed by Cursor Bugbot for commit a1a1272. Bugbot is set up for automated code reviews on this repo. Configure here.

`trl vllm-serve` currently doesn't expose vLLM's reasoning config
options. This means `thinking_token_budget` passed via
`GRPOConfig.generation_kwargs` fails with a 400 error:

    "thinking_token_budget is set but reasoning_config is not configured"

because the vLLM engine has no `ReasoningConfig` set.

This adds `--reasoning_parser` and `--reasoning_config` CLI flags to
`ScriptArguments` in `vllm_serve.py` and passes them through to the
`LLM()` constructor. This enables thinking-aware models (Qwen3.5,
DeepSeek-R1, etc.) to use `thinking_token_budget` during GRPO training
without needing to launch vLLM manually outside of TRL.

Example usage:
    trl vllm-serve \
        --model Qwen/Qwen3.5-9B \
        --reasoning_parser qwen3

Then in the training script:
    GRPOConfig(
        ...,
        generation_kwargs={"thinking_token_budget": 1024},
    )

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants