[WIP] Add DP-aware routing support to KVEvents and indexing pipeline#370
[WIP] Add DP-aware routing support to KVEvents and indexing pipeline#370satyamg1620 wants to merge 1 commit intollm-d:mainfrom
Conversation
Signed-off-by: satyamg1620 <Satyam.Gupta.3@ibm.com>
|
@vMaroon Can you please review this initial draft PR. Let me know if any changes are required. |
|
@satyamg1620 thank you for the contribution! Generally I think the approach is correct - this is the only way to currently cover all deployments mentioned in https://docs.vllm.ai/en/stable/serving/data_parallel_deployment. Today in llm-d we only support the external LB mode, in which every rank is a separate deployment, and the pipeline works as follows:
Though for other DP modes, there is one gap on consumption from the scheduler side: the scheduler currently has nothing that connects actual DP-rank information with the port. The bridge between the scheduler and the indexed data is missing here. It would be great if you can prepare an overview of how llm-d would support each of the modes in the vllm doc. Once a plan is conceived on this kind of coverage - we can proceed in a phased approach where this PR is one, and scheduler updates are another. |
|
sure @vMaroon . I will prepare an overview for same. |
Summary
This PR resolves issue #357