Skip to content

Draft: Linear attention support for KVCacheManager #12185

Draft
VALLIS-NERIA wants to merge 52 commits intoNVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/linear_reuse_new
Draft

Draft: Linear attention support for KVCacheManager #12185
VALLIS-NERIA wants to merge 52 commits intoNVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/linear_reuse_new

Conversation

@VALLIS-NERIA
Copy link
Collaborator

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

SimengLiu-nv and others added 10 commits March 5, 2026 15:29
…ock/mNextBlocks with lookup-node pointers.

Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 6bf59b7 to 3312fa9 Compare March 18, 2026 14:13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run

@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 502efa9 to eebd047 Compare March 19, 2026 12:47
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39595 [ run ] triggered by Bot. Commit: eebd047 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39596 [ run ] triggered by Bot. Commit: eebd047 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39595 [ run ] completed with state ABORTED. Commit: eebd047

Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39596 [ run ] completed with state FAILURE. Commit: eebd047
/LLM/main/L0_MergeRequest_PR pipeline #30809 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run --stage-list "A10-PyTorch-2, DGX_B200-PyTorch-1, DGX_B200-PyTorch-2, DGX_H100-PyTorch-1, DGX_H100-PyTorch-3, L40S-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39890 [ run ] triggered by Bot. Commit: 325e454 Link to invocation

…config

Replace Python floor division (//) with ceil_div when computing
num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in
get_bindings_model_config. This ensures correct sharding for models
whose head counts are not evenly divisible by the parallelism factors.

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #39890 [ run ] completed with state SUCCESS. Commit: 325e454
/LLM/main/L0_MergeRequest_PR pipeline #31060 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…gating_delta_rule_update

- Replace hardcoded stride(0)==1 check with is_contiguous() in
  causalConv1dUpdate
- Use explicit stride parameter (s_h0_0) instead of hardcoded
  HV*K*V for h0_source indexing in the triton kernel, enabling
  non-contiguous initial_state_source layouts
- Add int64 cast to prevent int32 overflow in index computation
- Add device_assert bounds check for h0_source store
- Add input_guard_exclude decorator to skip contiguous() on
  selected tensor arguments

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Removed bounds checking assertion for h0_source index.

Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
… into user/xiweny/linear_reuse_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…ser/xiweny/linear_reuse_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39938 [ run ] triggered by Bot. Commit: 1c83f1e Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 8fed1c2 to b31dd85 Compare March 24, 2026 02:21
@tensorrt-cicd
Copy link
Collaborator

PR_Github #39938 [ run ] completed with state SUCCESS. Commit: 1c83f1e
/LLM/main/L0_MergeRequest_PR pipeline #31105 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40065 [ run ] triggered by Bot. Commit: 12d8dda Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40065 [ run ] completed with state SUCCESS. Commit: 12d8dda
/LLM/main/L0_MergeRequest_PR pipeline #31219 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40106 [ run ] triggered by Bot. Commit: 12d8dda Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #40106 [ run ] completed with state SUCCESS. Commit: 12d8dda
/LLM/main/L0_MergeRequest_PR pipeline #31257 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #40140 [ run ] triggered by Bot. Commit: 4717171 Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants