Skip to content

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len#12341

Draft
zhaoyangwang-nvidia wants to merge 2 commits intoNVIDIA:mainfrom
zhaoyangwang-nvidia:refactor-mtp-nlayers
Draft

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len#12341
zhaoyangwang-nvidia wants to merge 2 commits intoNVIDIA:mainfrom
zhaoyangwang-nvidia:refactor-mtp-nlayers

Conversation

@zhaoyangwang-nvidia
Copy link
Collaborator

@zhaoyangwang-nvidia zhaoyangwang-nvidia commented Mar 19, 2026

…ax_draft_len

@coderabbitai summary

Description

The internal field num_nextn_predict_layers_from_model_config has been removed and replaced by num_nextn_predict_layers.

The original num_nextn_predict_layers field in MTPDecodingConfig, which conflated two separate concerns, was split into two fields with clear responsibilities:

Field Source Role
max_draft_len User-facing Controls how many draft tokens to produce
num_nextn_predict_layers Auto-populated from model (internal) How many MTP layers actually exist in the checkpoint

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

• num_nextn_predict_layers = 1 (read from model)
• max_draft_len = N (set by user, default 1)
• Behavior: runs the single MTP layer N times, producing N draft tokens

Vanilla MTP (model has multiple MTP layers)

• num_nextn_predict_layers = M (read from model)
• User does not set max_draft_len → automatically uses M, runs all layers
• User explicitly sets max_draft_len = N:
	○ N < M: prints a warning, uses N layers, produces N draft tokens

N >= M: uses M, produces M draft tokens

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39550 [ run ] triggered by Bot. Commit: dd33dcc Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39558 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39558 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30775 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39583 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39583 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30795 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39665 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39665 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30869 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39717 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39717 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30914 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39735 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39735 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30930 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39785 [ run ] triggered by Bot. Commit: 5187058 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39785 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/21.

Link to invocation

…ax_draft_len

Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>
@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39813 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39813 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39820 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39820 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39827 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants