Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
bdb3791
[None][feat] Wire KVCacheBlock to UnifiedBlockTree, replacing mPrevBl…
SimengLiu-nv Mar 4, 2026
bd4810a
Address comments.
SimengLiu-nv Mar 5, 2026
27574b9
block allocation and reusing works for linear attention
VALLIS-NERIA Jan 28, 2026
3543bbe
copy states during context shifts
VALLIS-NERIA Jan 30, 2026
36aa474
fix corner cases
VALLIS-NERIA Feb 4, 2026
cd1a67b
temp stage: accuracy w/o reuse ok
VALLIS-NERIA Mar 2, 2026
94d4312
temp stage: accuracy with reuse ok
VALLIS-NERIA Mar 2, 2026
d885842
fix merge conflicts
VALLIS-NERIA Mar 9, 2026
603d822
Merge remote-tracking branch 'origin/main' into pr-11919
VALLIS-NERIA Mar 9, 2026
b398561
temporary stage
VALLIS-NERIA Mar 13, 2026
df7284a
fix multiple issues
VALLIS-NERIA Mar 14, 2026
ce9674a
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 14, 2026
cab2412
use pre calculated buffers
VALLIS-NERIA Mar 14, 2026
a1889b8
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 17, 2026
22e7fd2
scheduler support
VALLIS-NERIA Mar 18, 2026
6475692
FIFO placeholder management
VALLIS-NERIA Mar 18, 2026
3312fa9
remove debug prints in module/op level
VALLIS-NERIA Mar 18, 2026
9b73cbf
change memory layout to layer first
VALLIS-NERIA Mar 18, 2026
efbb815
fix scheduler
VALLIS-NERIA Mar 18, 2026
aa15395
auto choose mamba cache manager impl
VALLIS-NERIA Mar 19, 2026
5bfda48
format code
VALLIS-NERIA Mar 19, 2026
f9e2ad0
fix unhandled kFORCE_CHUNK enum in switch statement
VALLIS-NERIA Mar 20, 2026
1810dba
fix config of current implementation
VALLIS-NERIA Mar 20, 2026
cf50425
merge upstream main and resolve conflicts
VALLIS-NERIA Mar 20, 2026
4dd57bf
fix missing is_nemotron_hybrid/is_qwen3_hybrid imports
VALLIS-NERIA Mar 20, 2026
b4e54e7
remove some hacks
VALLIS-NERIA Mar 20, 2026
ee0b690
[Agent fix] restore block reuse defaults and fix AutoDeploy mamba_lay…
VALLIS-NERIA Mar 20, 2026
782c46f
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 21, 2026
75b9438
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 22, 2026
c27c351
revert to use old mambacachemanager as default
VALLIS-NERIA Mar 22, 2026
850bd66
[Agent fix] Remove debug prints, commented debug code, and tensor dum…
VALLIS-NERIA Mar 22, 2026
b0921fb
fix not mine unit tests
VALLIS-NERIA Mar 22, 2026
7f03f58
temporary disable my unit tests to run CI
VALLIS-NERIA Mar 22, 2026
d020bf6
Revert "revert to use old mambacachemanager as default"
VALLIS-NERIA Mar 22, 2026
98da518
only auto-deploy uses old mambacachemanager & fix beam search
VALLIS-NERIA Mar 22, 2026
eb0044d
use ceil div for head split
VALLIS-NERIA Mar 23, 2026
325e454
get rid of model_config
VALLIS-NERIA Mar 23, 2026
27ef0bf
[TRTLLM-10061][fix] Use ceil_div for head/size calculations in model_…
VALLIS-NERIA Mar 23, 2026
8620daa
[TRTLLM-10061][feat] Add stride support for conv1d and fused_sigmoid_…
VALLIS-NERIA Mar 23, 2026
398495f
fix memory usage and model_config check
VALLIS-NERIA Mar 23, 2026
41f1b77
Remove index bounds checking in h0_source store
VALLIS-NERIA Mar 23, 2026
1281d40
Merge remote-tracking branch 'fork/user/xiweny/ceil_div_model_config'…
VALLIS-NERIA Mar 23, 2026
96c62e7
Merge remote-tracking branch 'fork/user/xiweny/stride_support' into u…
VALLIS-NERIA Mar 23, 2026
1c83f1e
refine evictionpolicy
VALLIS-NERIA Mar 23, 2026
7dacd9e
refine mamba cache manager
VALLIS-NERIA Mar 24, 2026
ab3bc32
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 24, 2026
b31dd85
clean up unnecessary chagnes
VALLIS-NERIA Mar 24, 2026
81eb415
fix
VALLIS-NERIA Mar 24, 2026
12d8dda
add tests for scheduler
VALLIS-NERIA Mar 24, 2026
45f0fa5
fix
VALLIS-NERIA Mar 24, 2026
a8dea92
fix kvcache manager ut
VALLIS-NERIA Mar 24, 2026
4717171
Merge remote-tracking branch 'origin/main' into user/xiweny/linear_re…
VALLIS-NERIA Mar 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 45 additions & 5 deletions cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ class BaseEvictionPolicy

// TODO(TRTLLM-1564): Don't use a separate `initialize` function. Ensure eviction policies can't be in-between a
// state of construction and initialization.
virtual void initialize(std::vector<BlockPtr>& mAllBlocksById, std::vector<SizeType32> sizes,
virtual void initialize(std::vector<BlockPtr>& mAllBlocksById, std::vector<SizeType32> blocksPerCacheLevel,
std::optional<executor::RetentionPriority> secondaryOffloadMinPriority)
= 0;

/// @brief Get a free block from the specified cache level
/// @returns The pointer to the free block, along with whether it can be offloaded
virtual std::tuple<BlockPtr, bool> getFreeBlock(SizeType32 cacheLevel) = 0;
/// @param wantPlaceholder If true, return a placeholder block instead of a normal block
virtual std::tuple<BlockPtr, bool> getFreeBlock(SizeType32 cacheLevel, bool wantPlaceholder = false) = 0;
/// @brief Release a block. Prioritize the block for eviction if toFront=true
virtual void releaseBlock(BlockPtr block) = 0;
virtual void releaseBlock(BlockPtr block, bool toFront) = 0;
Expand Down Expand Up @@ -70,9 +71,9 @@ struct ExpiringBlockComparator
class LRUEvictionPolicy : public BaseEvictionPolicy
{
public:
void initialize(std::vector<BlockPtr>& mAllBlocksById, std::vector<SizeType32> sizes,
void initialize(std::vector<BlockPtr>& mAllBlocksById, std::vector<SizeType32> blocksPerCacheLevel,
std::optional<executor::RetentionPriority> secondaryOffloadMinPriority) override;
std::tuple<BlockPtr, bool> getFreeBlock(SizeType32 cacheLevel) override;
std::tuple<BlockPtr, bool> getFreeBlock(SizeType32 cacheLevel, bool wantPlaceholder = false) override;

void releaseBlock(BlockPtr block) override;
void releaseBlock(BlockPtr block, bool toFront) override;
Expand All @@ -91,7 +92,15 @@ class LRUEvictionPolicy : public BaseEvictionPolicy

bool verifyQueueIntegrity() override;

private:
protected:
/// @brief Map a block ID to the index into mFreeBlockIterators.
/// Default: identity (block IDs are 0-based non-negative integers).
/// Override for policies managing blocks with non-standard IDs (e.g. negative placeholder IDs).
virtual SizeType32 blockIdx(KVCacheBlock::IdType blockId) const
{
return blockId;
}

// Queues of available leaf blocks, split by cache level and priority level
std::vector<std::vector<FreeBlocksQueue>> mFreeQueues;
// Iterators to block entries in mFreeQueues
Expand All @@ -104,4 +113,35 @@ class LRUEvictionPolicy : public BaseEvictionPolicy
std::set<BlockPtr, ExpiringBlockComparator> mExpiringBlockHeap;
};

/// @brief Extends LRUEvictionPolicy to manage pre-allocated placeholder blocks via a dedicated inner
/// LRUEvictionPolicy (mPlaceholderEvictionPolicy). Placeholder blocks have negative IDs starting at -2.
/// Normal block operations are delegated to the base LRUEvictionPolicy; placeholder block operations
/// are delegated to mPlaceholderEvictionPolicy.
class MaybePlaceholderLRUEvictionPolicy : public LRUEvictionPolicy
{
public:
/// @brief Initialize the placeholder eviction policy with pre-allocated placeholder blocks.
/// @param allPlaceholderBlocksById Vector of placeholder blocks indexed by abs(blockId).
/// Indices 0 and 1 are unused (nullptr); index abs(blockId) holds the block with that ID.
/// @param numPlaceholderBlocks Number of placeholder blocks (determines valid index range [2,
/// numPlaceholderBlocks+1]).
/// @param secondaryOffloadMinPriority Secondary offload priority threshold (passed to inner policy).
void initializePlaceholders(std::vector<BlockPtr>& allPlaceholderBlocksById, SizeType32 numPlaceholderBlocks,
std::optional<executor::RetentionPriority> secondaryOffloadMinPriority);

std::tuple<BlockPtr, bool> getFreeBlock(SizeType32 cacheLevel, bool wantPlaceholder = false) override;

void releaseBlock(BlockPtr block, bool toFront) override;

void claimBlock(BlockPtr block, std::optional<executor::RetentionPriority> priority,
std::optional<std::chrono::milliseconds> durationMs) override;

void refresh() override;

bool verifyQueueIntegrity() override;

private:
std::shared_ptr<LRUEvictionPolicy> mPlaceholderEvictionPolicy;
};

} // namespace tensorrt_llm::batch_manager::eviction_policy
Loading
Loading