Skip to content

[WIP] Add DP-aware routing support to KVEvents and indexing pipeline#370

Draft
satyamg1620 wants to merge 1 commit intollm-d:mainfrom
satyamg1620:dp-aware-routing
Draft

[WIP] Add DP-aware routing support to KVEvents and indexing pipeline#370
satyamg1620 wants to merge 1 commit intollm-d:mainfrom
satyamg1620:dp-aware-routing

Conversation

@satyamg1620
Copy link

Summary

  • Propagate DataParallelRank from EventBatch through the KV event processing pipeline into PodEntry, enabling the index and scorer to distinguish KV blocks cached by different data-parallel ranks on the same pod.
  • Add DataParallelRank int field to PodEntry with sentinel value -1 (NoDataParallelRank) for backward compatibility with non-DP deployments.
  • Update LongestPrefixScorer to produce DP-aware scoring keys ("pod-1@dp0") so different ranks on the same pod receive independent scores.
  • Add optional int32 data_parallel_rank field to the PodScore proto message and update the gRPC server to populate it.

This PR resolves issue #357

Signed-off-by: satyamg1620 <Satyam.Gupta.3@ibm.com>
@satyamg1620
Copy link
Author

@vMaroon Can you please review this initial draft PR. Let me know if any changes are required.

@vMaroon
Copy link
Member

vMaroon commented Feb 28, 2026

@satyamg1620 thank you for the contribution!

Generally I think the approach is correct - this is the only way to currently cover all deployments mentioned in https://docs.vllm.ai/en/stable/serving/data_parallel_deployment. Today in llm-d we only support the external LB mode, in which every rank is a separate deployment, and the pipeline works as follows:

  1. Each rank publishes to a kv@<IP>:<PORT>@<MODEL> topic, and a PodIdentifier is then <IP>:<PORT>
  2. The scheduler treats each rank as a separate, normal endpoint, identified by <IP>:<PORT>

Though for other DP modes, there is one gap on consumption from the scheduler side: the scheduler currently has nothing that connects actual DP-rank information with the port.

The bridge between the scheduler and the indexed data is missing here. It would be great if you can prepare an overview of how llm-d would support each of the modes in the vllm doc. Once a plan is conceived on this kind of coverage - we can proceed in a phased approach where this PR is one, and scheduler updates are another.

@satyamg1620
Copy link
Author

sure @vMaroon . I will prepare an overview for same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants