Draft: Linear attention support for KVCacheManager #12185
Draft: Linear attention support for KVCacheManager #12185VALLIS-NERIA wants to merge 52 commits intoNVIDIA:mainfrom
Conversation
…ock/mNextBlocks with lookup-node pointers. Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
6bf59b7 to
3312fa9
Compare
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
/bot run |
502efa9 to
eebd047
Compare
|
/bot run |
|
PR_Github #39595 [ run ] triggered by Bot. Commit: |
|
PR_Github #39596 [ run ] triggered by Bot. Commit: |
|
PR_Github #39595 [ run ] completed with state |
|
PR_Github #39596 [ run ] completed with state
|
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
/bot run --stage-list "A10-PyTorch-2, DGX_B200-PyTorch-1, DGX_B200-PyTorch-2, DGX_H100-PyTorch-1, DGX_H100-PyTorch-3, L40S-PyTorch-1" |
|
PR_Github #39890 [ run ] triggered by Bot. Commit: |
…config Replace Python floor division (//) with ceil_div when computing num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in get_bindings_model_config. This ensures correct sharding for models whose head counts are not evenly divisible by the parallelism factors. Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
PR_Github #39890 [ run ] completed with state
|
…gating_delta_rule_update - Replace hardcoded stride(0)==1 check with is_contiguous() in causalConv1dUpdate - Use explicit stride parameter (s_h0_0) instead of hardcoded HV*K*V for h0_source indexing in the triton kernel, enabling non-contiguous initial_state_source layouts - Add int64 cast to prevent int32 overflow in index computation - Add device_assert bounds check for h0_source store - Add input_guard_exclude decorator to skip contiguous() on selected tensor arguments Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Removed bounds checking assertion for h0_source index. Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
… into user/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…ser/xiweny/linear_reuse_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #39938 [ run ] triggered by Bot. Commit: |
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
8fed1c2 to
b31dd85
Compare
|
PR_Github #39938 [ run ] completed with state
|
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
/bot run |
|
PR_Github #40065 [ run ] triggered by Bot. Commit: |
|
PR_Github #40065 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40106 [ run ] triggered by Bot. Commit: |
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
…use_new Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
PR_Github #40106 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40140 [ run ] triggered by Bot. Commit: |
@coderabbitai summary
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.