Draft: Linear attention support for KVCacheManager by VALLIS-NERIA · Pull Request #12185 · NVIDIA/TensorRT-LLM

VALLIS-NERIA · 2026-03-13T05:19:42Z

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ock/mNextBlocks with lookup-node pointers. Signed-off-by: SimengLiu-nv <simengl@nvidia.com>

Signed-off-by: SimengLiu-nv <simengl@nvidia.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA · 2026-03-19T12:44:01Z

/bot run

VALLIS-NERIA · 2026-03-19T12:48:07Z

/bot run

tensorrt-cicd · 2026-03-19T12:50:20Z

PR_Github #39595 [ run ] triggered by Bot. Commit: eebd047 Link to invocation

tensorrt-cicd · 2026-03-19T12:53:54Z

PR_Github #39596 [ run ] triggered by Bot. Commit: eebd047 Link to invocation

tensorrt-cicd · 2026-03-19T12:54:11Z

PR_Github #39595 [ run ] completed with state ABORTED. Commit: eebd047

Link to invocation

tensorrt-cicd · 2026-03-19T14:08:39Z

PR_Github #39596 [ run ] completed with state FAILURE. Commit: eebd047
/LLM/main/L0_MergeRequest_PR pipeline #30809 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA · 2026-03-23T06:14:27Z

/bot run --stage-list "A10-PyTorch-2, DGX_B200-PyTorch-1, DGX_B200-PyTorch-2, DGX_H100-PyTorch-1, DGX_H100-PyTorch-3, L40S-PyTorch-1"

tensorrt-cicd · 2026-03-23T06:21:05Z

PR_Github #39890 [ run ] triggered by Bot. Commit: 325e454 Link to invocation

…config Replace Python floor division (//) with ceil_div when computing num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in get_bindings_model_config. This ensures correct sharding for models whose head counts are not evenly divisible by the parallelism factors. Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

tensorrt-cicd · 2026-03-23T08:53:13Z

PR_Github #39890 [ run ] completed with state SUCCESS. Commit: 325e454
/LLM/main/L0_MergeRequest_PR pipeline #31060 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…gating_delta_rule_update - Replace hardcoded stride(0)==1 check with is_contiguous() in causalConv1dUpdate - Use explicit stride parameter (s_h0_0) instead of hardcoded HV*K*V for h0_source indexing in the triton kernel, enabling non-contiguous initial_state_source layouts - Add int64 cast to prevent int32 overflow in index computation - Add device_assert bounds check for h0_source store - Add input_guard_exclude decorator to skip contiguous() on selected tensor arguments Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Removed bounds checking assertion for h0_source index. Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>

… into user/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

…ser/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA · 2026-03-23T13:07:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-23T13:14:12Z

PR_Github #39938 [ run ] triggered by Bot. Commit: 1c83f1e Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

tensorrt-cicd · 2026-03-24T02:22:36Z

PR_Github #39938 [ run ] completed with state SUCCESS. Commit: 1c83f1e
/LLM/main/L0_MergeRequest_PR pipeline #31105 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA · 2026-03-24T05:43:10Z

/bot run

tensorrt-cicd · 2026-03-24T05:49:28Z

PR_Github #40065 [ run ] triggered by Bot. Commit: 12d8dda Link to invocation

tensorrt-cicd · 2026-03-24T09:41:33Z

PR_Github #40065 [ run ] completed with state SUCCESS. Commit: 12d8dda
/LLM/main/L0_MergeRequest_PR pipeline #31219 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA · 2026-03-24T10:03:47Z

/bot run

tensorrt-cicd · 2026-03-24T10:09:36Z

PR_Github #40106 [ run ] triggered by Bot. Commit: 12d8dda Link to invocation

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

tensorrt-cicd · 2026-03-24T15:01:06Z

PR_Github #40106 [ run ] completed with state SUCCESS. Commit: 12d8dda
/LLM/main/L0_MergeRequest_PR pipeline #31257 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

VALLIS-NERIA · 2026-03-24T15:01:21Z

/bot run

tensorrt-cicd · 2026-03-24T15:07:52Z

PR_Github #40140 [ run ] triggered by Bot. Commit: 4717171 Link to invocation

SimengLiu-nv and others added 10 commits March 5, 2026 15:29

[None][feat] Wire KVCacheBlock to UnifiedBlockTree, replacing mPrevBl…

bdb3791

…ock/mNextBlocks with lookup-node pointers. Signed-off-by: SimengLiu-nv <simengl@nvidia.com>

Address comments.

bd4810a

Signed-off-by: SimengLiu-nv <simengl@nvidia.com>

block allocation and reusing works for linear attention

27574b9

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

copy states during context shifts

3543bbe

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

fix corner cases

36aa474

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

temp stage: accuracy w/o reuse ok

cd1a67b

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

temp stage: accuracy with reuse ok

94d4312

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

fix merge conflicts

d885842

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into pr-11919

603d822

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

temporary stage

b398561

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

github-actions bot assigned VALLIS-NERIA Mar 13, 2026

VALLIS-NERIA added 7 commits March 14, 2026 11:20

fix multiple issues

df7284a

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…

ce9674a

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

use pre calculated buffers

cab2412

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…

a1889b8

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

scheduler support

22e7fd2

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

FIFO placeholder management

6475692

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

remove debug prints in module/op level

3312fa9

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 6bf59b7 to 3312fa9 Compare March 18, 2026 14:13

VALLIS-NERIA added 3 commits March 18, 2026 22:15

change memory layout to layer first

9b73cbf

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

fix scheduler

efbb815

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

auto choose mamba cache manager impl

aa15395

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 502efa9 to eebd047 Compare March 19, 2026 12:47

format code

5bfda48

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA added 6 commits March 23, 2026 16:54

fix memory usage and model_config check

398495f

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Remove index bounds checking in h0_source store

41f1b77

Removed bounds checking assertion for h0_source index. Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'fork/user/xiweny/ceil_div_model_config'…

1281d40

… into user/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'fork/user/xiweny/stride_support' into u…

96c62e7

…ser/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

refine evictionpolicy

1c83f1e

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA added 3 commits March 24, 2026 09:08

refine mamba cache manager

7dacd9e

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…

ab3bc32

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

clean up unnecessary chagnes

b31dd85

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA force-pushed the user/xiweny/linear_reuse_new branch from 8fed1c2 to b31dd85 Compare March 24, 2026 02:21

VALLIS-NERIA added 2 commits March 24, 2026 13:39

fix

81eb415

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

add tests for scheduler

12d8dda

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

fix

45f0fa5

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

VALLIS-NERIA added 2 commits March 24, 2026 22:59

fix kvcache manager ut

a8dea92

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…

4717171

…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

Conversation

VALLIS-NERIA commented Mar 13, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

VALLIS-NERIA commented Mar 19, 2026

Uh oh!

VALLIS-NERIA commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

VALLIS-NERIA commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

VALLIS-NERIA commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

VALLIS-NERIA commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

VALLIS-NERIA commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

VALLIS-NERIA commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants