Conversation
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
| Medium: &medium, | ||
| LoraName: nil, | ||
|
|
||
| // Create event in vLLM msgpack array format: [tag, hashes, parent, tokens, blockSize, loraID, medium, loraName] |
There was a problem hiding this comment.
Previously, test events were created using specific event structures and then converted to a tagged union format via ToTaggedUnion(). This tagged union matched the exact format vllm sends to llm-d. The tagged union structure was necessary because of double marshaling: first to extracted the event type tag, and the second for the actual event data. I avoided it so I completely removed the ToTaggedUnion().
| kv_events_config=kv_events_config, | ||
| block_size=16, | ||
| prefix_caching_hash_algo="sha256_cbor", | ||
| prefix_caching_hash_algo="sha256_cbor_64bit", |
There was a problem hiding this comment.
Had this error when running the test:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)
| // getHashAsUint64 converts vLLM hash formats (uint64 or []byte) to uint64. | ||
| // This handles both legacy uint64 hashes and new []byte hashes by taking | ||
| // the last 8 bytes and interpreting them as a big-endian integer. | ||
| func (v *VLLMAdapter) getHashAsUint64(raw any) (uint64, error) { |
There was a problem hiding this comment.
Maybe it should be a general/utility function rather than 'vllm-specific'
| // parseVLLMTopic extracts pod ID and model name from vLLM topic format. | ||
| // Expected format: "pod_id@model_name" | ||
| // TODO: Find a way to avoid it | ||
| func parseVLLMTopic(topic string) (podID, modelName string) { |
There was a problem hiding this comment.
I kept the same logic as before
| return &events.AllBlocksClearedEvent{}, nil | ||
| } | ||
|
|
||
| // TODO: not sure if it best to keep or remove these |
There was a problem hiding this comment.
I'm not sure whether it's better to abstract the inner structures from the subscriber (so it only uses the adapter) or to make it use those methods directly from the transport
| } | ||
|
|
||
| // Check if pod matches our label selector | ||
| if !r.Config.PodLabelSelector.Matches(labels.Set(pod.Labels)) { |
There was a problem hiding this comment.
We might need to introduce an inference engine as one of the pods identifiers
| {{- if .Values.kvCacheManager.enabled }} | ||
| --kv-events-config "{\"enable_kv_cache_events\":{{ .Values.kvCacheManager.enabled }},\"publisher\":\"zmq\",\"endpoint\":\"{{ include "chart.kvCacheManagerServiceUrl" . }}\",\"topic\":\"kv@${POD_IP}@{{ .Values.vllm.model.name }}\"}" \ | ||
| --prefix-caching-hash-algo sha256_cbor \ | ||
| --prefix-caching-hash-algo sha256_cbor_64bit \ |
There was a problem hiding this comment.
Had this error:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)
pkg/kvevents/subscriber.go
Outdated
| @@ -0,0 +1,145 @@ | |||
| // Copyright 2025 The llm-d Authors. | |||
There was a problem hiding this comment.
This file is very similar to the previous zmq_subscriber.go. I'm not sure why it's not just showing as 'renamed' + the changed lines. If it's difficult to compare, I can try to fix it
…stractions for multi-engine support. Modify Pool and Subscribers to use new layers.
6022335 to
6a6b5b9
Compare
sagearc
left a comment
There was a problem hiding this comment.
Hey @NaomiEisen, awesome work here! This is a big one though, 4 new packages plus the core refactor is a lot to review together. Can you split it into multiple, more focused PRs? Thanks!
Thanks for your response, and apologies for the inconvenience 🙏 I'm concerned that splitting it might result in intermediate PRs that are not fully functional or don't make sense from a design perspective. Also, if I understood correctly, we'd like to get these changes ASAP, and I'm afraid that breaking them up might slow down the process (at least from my side). That said, I'm happy to adjust if you have suggestions :) |
vMaroon
left a comment
There was a problem hiding this comment.
Overall great work - the instinct and direction are right.
Before we think about splitting into multiple PRs, I think we should first tighten the abstractions. Seeing the actual code makes it clearer than the design doc did - the interface surface is wider than what the callers actually need. I left some comments, but the common thread is to design each interface from the caller's perspective, and let adapters own their internals privately.
Practically this means to focus on the essence of the work you made: the contracts between the pool, engine adapters and subscribers.
| limitations under the License. | ||
| */ | ||
|
|
||
| package decoder |
There was a problem hiding this comment.
This package wraps a single msgpack.Unmarshal call, and vllm_adapter.go still calls msgpack.Unmarshal directly in the converters. I think we can drop this package and have serialization remain an internal detail of each adapter.
| // EngineAdapter defines the interface for engine-specific adapters. | ||
| // Each inference engine has its own adapter implementation that handles | ||
| // engine-specific operations. | ||
| type EngineAdapter interface { |
There was a problem hiding this comment.
Transport() and Decode() are not used outside the adapter. I think similarly to the decoder, we can maybe collapse the transport into the adapter until the separation is needed.
| DecodeMessageToEventBatch(msg *RawMessage) (*events.EventBatch, error) | ||
|
|
||
| // Connect establishes a connection to a remote endpoint. | ||
| Connect(ctx context.Context, endpoint string) error |
There was a problem hiding this comment.
Can we combine Connect, Bind and SubscribeToTopic into Setup(ctx, endpoint, topicFilter, remote bool) error?
| Type() EventType | ||
|
|
||
| // Process processes the event and updates the index. | ||
| Process(ctx context.Context, index kvblock.Index, tokenProcessor kvblock.TokenProcessor, |
There was a problem hiding this comment.
Having events know about kvblock.Index and TokenProcessor couples data to infra. What if a future engine needs different processing logic? Consider keeping events as pure data and letting the pool own the processing as it previously did.
The processing switch-case in pool.digestEvents was one function that kept the indexing coupling in one place - that's a tighter contract than distributing it across every event type.
|
|
||
| // Transport defines the interface for receiving raw bytes from different | ||
| // transport protocols (ZMQ, HTTP, gRPC, etc.). | ||
| type Transport interface { |
There was a problem hiding this comment.
This looks like a ZMQ wrapper and is not used by the subscriber - which strengthens the point of collapsing it into the adapter.
| // Payload is the raw msgpack-encoded event batch bytes, not yet decoded. | ||
| Payload []byte | ||
| // Adapter is the engine adapter that can decode this payload. | ||
| Adapter EngineAdapter |
There was a problem hiding this comment.
I assume that this was added here to let the pool manage the decoding of the payload? A pool already has a reference to a single adapter type, we can eject this circular dependency.
Overview
This PR introduces abstraction layers for KV-cache events. The refactoring separates transport protocols, serialization, and engine-specific event structure into distinct layers.
See design docs for full review.
Key Changes
New Abstraction Layers
pkg/kvevents/transport/): Abstracts communication protocols.pkg/kvevents/decoder/): Abstracts serialization formats.pkg/kvevents/engineadapter/): Converts engine specific events to generic events.Event Processing Refactor
BlockStoredEvent,BlockRemovedEvent,AllBlocksClearedEvent) now implements its ownProcess()method.ExtraKeysfield to support vLLM's new event format (currently unused).Testing
Tested on:
pkg/kvevents/engineadapter/vllm_adapter_test.gotests/integration/kv_events_test.gopkg/kvevents/subscriber_manager_test.gollm-d-inference-scheduler:v0.5.0.