[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len by zhaoyangwang-nvidia · Pull Request #12341 · NVIDIA/TensorRT-LLM

zhaoyangwang-nvidia · 2026-03-19T06:14:29Z

…ax_draft_len

Description

The internal field num_nextn_predict_layers_from_model_config has been removed and replaced by num_nextn_predict_layers.

The original num_nextn_predict_layers field in MTPDecodingConfig, which conflated two separate concerns, was split into two fields with clear responsibilities:

Field	Source	Role
`max_draft_len`	User-facing	Controls how many draft tokens to produce
`num_nextn_predict_layers`	Auto-populated from model (internal)	How many MTP layers actually exist in the checkpoint

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

• num_nextn_predict_layers = 1 (read from model)
• max_draft_len = N (set by user, default 1)
• Behavior: runs the single MTP layer N times, producing N draft tokens

Vanilla MTP (model has multiple MTP layers)

• num_nextn_predict_layers = M (read from model)
• User does not set max_draft_len → automatically uses M, runs all layers
• User explicitly sets max_draft_len = N:
	○ N < M: prints a warning, uses N layers, produces N draft tokens

N >= M: uses M, produces M draft tokens

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

zhaoyangwang-nvidia · 2026-03-19T06:18:41Z

/bot run

tensorrt-cicd · 2026-03-19T06:24:26Z

PR_Github #39550 [ run ] triggered by Bot. Commit: dd33dcc Link to invocation

zhaoyangwang-nvidia · 2026-03-19T06:56:17Z

/bot run

tensorrt-cicd · 2026-03-19T07:01:50Z

PR_Github #39558 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-19T09:12:29Z

PR_Github #39558 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30775 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-19T09:17:45Z

/bot run

tensorrt-cicd · 2026-03-19T09:24:16Z

PR_Github #39583 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-19T11:18:56Z

PR_Github #39583 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30795 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T01:49:21Z

/bot run

tensorrt-cicd · 2026-03-20T01:54:56Z

PR_Github #39665 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-20T03:50:41Z

PR_Github #39665 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30869 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T08:08:18Z

/bot run

tensorrt-cicd · 2026-03-20T08:14:27Z

PR_Github #39717 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

tensorrt-cicd · 2026-03-20T11:40:48Z

PR_Github #39717 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30914 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T11:53:57Z

/bot run

tensorrt-cicd · 2026-03-20T11:59:36Z

PR_Github #39735 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

tensorrt-cicd · 2026-03-20T14:26:43Z

PR_Github #39735 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30930 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-21T03:45:56Z

/bot run

tensorrt-cicd · 2026-03-21T03:51:47Z

PR_Github #39785 [ run ] triggered by Bot. Commit: 5187058 Link to invocation

tensorrt-cicd · 2026-03-21T03:51:48Z

PR_Github #39785 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/21.

Link to invocation

…ax_draft_len Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>

Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>

zhaoyangwang-nvidia · 2026-03-22T08:22:20Z

/bot run

tensorrt-cicd · 2026-03-22T08:28:15Z

PR_Github #39813 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

tensorrt-cicd · 2026-03-22T10:21:33Z

PR_Github #39813 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-22T10:41:16Z

/bot run

tensorrt-cicd · 2026-03-22T10:46:50Z

PR_Github #39820 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

tensorrt-cicd · 2026-03-22T12:45:48Z

PR_Github #39820 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-22T14:02:59Z

/bot run

tensorrt-cicd · 2026-03-22T14:09:31Z

PR_Github #39827 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

github-actions bot assigned zhaoyangwang-nvidia Mar 19, 2026

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from da046c5 to 4289529 Compare March 20, 2026 08:08

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from 4289529 to 5187058 Compare March 21, 2026 03:45

zhaoyangwang-nvidia added 2 commits March 22, 2026 16:22

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from m…

e02a7eb

…ax_draft_len Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>

Remove some useless code

a79ca0f

Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from 5187058 to a79ca0f Compare March 22, 2026 08:22

Conversation

zhaoyangwang-nvidia commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

Vanilla MTP (model has multiple MTP layers)

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 21, 2026

Uh oh!

tensorrt-cicd commented Mar 21, 2026

Uh oh!

tensorrt-cicd commented Mar 21, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhaoyangwang-nvidia commented Mar 19, 2026 •

edited

Loading