KV-events abstraction by NaomiEisen · Pull Request #356 · llm-d/llm-d-kv-cache

NaomiEisen · 2026-02-25T00:10:26Z

Overview

This PR introduces abstraction layers for KV-cache events. The refactoring separates transport protocols, serialization, and engine-specific event structure into distinct layers.

See design docs for full review.

Key Changes

New Abstraction Layers

Transport Layer (pkg/kvevents/transport/): Abstracts communication protocols.
Decoder Layer (pkg/kvevents/decoder/): Abstracts serialization formats.
Engine Adapter Layer (pkg/kvevents/engineadapter/): Converts engine specific events to generic events.

Event Processing Refactor

Moved event processing logic into event structures: Each event type (BlockStoredEvent, BlockRemovedEvent, AllBlocksClearedEvent) now implements its own Process() method.
Removed double marshal/unmarshal: Events are decoded once by the adapter and passed as structured data to the pool.
Added ExtraKeys field to support vLLM's new event format (currently unused).

Testing

Tested on:

Unit tests: pkg/kvevents/engineadapter/vllm_adapter_test.go
tests/integration/kv_events_test.go
pkg/kvevents/subscriber_manager_test.go
Performance tests: Using this guide and comparing results against llm-d-inference-scheduler:v0.5.0.

github-actions · 2026-02-25T00:10:35Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

NaomiEisen · 2026-02-25T00:19:07Z

examples/helper/events.go

-		Medium:          &medium,
-		LoraName:        nil,
+
+	// Create event in vLLM msgpack array format: [tag, hashes, parent, tokens, blockSize, loraID, medium, loraName]


Previously, test events were created using specific event structures and then converted to a tagged union format via ToTaggedUnion(). This tagged union matched the exact format vllm sends to llm-d. The tagged union structure was necessary because of double marshaling: first to extracted the event type tag, and the second for the actual event data. I avoided it so I completely removed the ToTaggedUnion().

NaomiEisen · 2026-02-25T00:24:03Z

examples/kv_events/vllm/vllm_kv_cache_demo.py

        kv_events_config=kv_events_config,
        block_size=16,
-        prefix_caching_hash_algo="sha256_cbor",
+        prefix_caching_hash_algo="sha256_cbor_64bit",


Had this error when running the test:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)

NaomiEisen · 2026-02-25T00:26:08Z

pkg/kvevents/engineadapter/vllm_adapter.go

+// getHashAsUint64 converts vLLM hash formats (uint64 or []byte) to uint64.
+// This handles both legacy uint64 hashes and new []byte hashes by taking
+// the last 8 bytes and interpreting them as a big-endian integer.
+func (v *VLLMAdapter) getHashAsUint64(raw any) (uint64, error) {


Maybe it should be a general/utility function rather than 'vllm-specific'

NaomiEisen · 2026-02-25T00:27:46Z

pkg/kvevents/engineadapter/vllm_adapter.go

+// parseVLLMTopic extracts pod ID and model name from vLLM topic format.
+// Expected format: "pod_id@model_name"
+// TODO: Find a way to avoid it
+func parseVLLMTopic(topic string) (podID, modelName string) {


I kept the same logic as before

NaomiEisen · 2026-02-25T00:30:45Z

pkg/kvevents/engineadapter/vllm_adapter.go

+	return &events.AllBlocksClearedEvent{}, nil
+}
+
+// TODO: not sure if it best to keep or remove these


I'm not sure whether it's better to abstract the inner structures from the subscriber (so it only uses the adapter) or to make it use those methods directly from the transport

NaomiEisen · 2026-02-25T00:39:52Z

examples/kv_events/pod_reconciler/pod_reconciler.go

 	}

 	// Check if pod matches our label selector
 	if !r.Config.PodLabelSelector.Matches(labels.Set(pod.Labels)) {


We might need to introduce an inference engine as one of the pods identifiers

NaomiEisen · 2026-02-25T00:43:57Z

vllm-setup-helm/templates/deployment.yaml

              {{- if .Values.kvCacheManager.enabled }}
              --kv-events-config "{\"enable_kv_cache_events\":{{ .Values.kvCacheManager.enabled }},\"publisher\":\"zmq\",\"endpoint\":\"{{ include "chart.kvCacheManagerServiceUrl" . }}\",\"topic\":\"kv@${POD_IP}@{{ .Values.vllm.model.name }}\"}" \
-              --prefix-caching-hash-algo sha256_cbor \
+              --prefix-caching-hash-algo sha256_cbor_64bit \


Had this error:
INFO 02-24 02:10:17 [__init__.py:235] Automatically detected platform cuda. usage: vllm serve [model_tag] [options] vllm serve: error: argument --prefix-caching-hash-algo: invalid choice: 'sha256_cbor' (choose from builtin, sha256, sha256_cbor_64bit)

NaomiEisen · 2026-02-25T00:48:37Z

pkg/kvevents/subscriber.go

@@ -0,0 +1,145 @@
+// Copyright 2025 The llm-d Authors.


This file is very similar to the previous zmq_subscriber.go. I'm not sure why it's not just showing as 'renamed' + the changed lines. If it's difficult to compare, I can try to fix it

…stractions for multi-engine support. Modify Pool and Subscribers to use new layers.

sagearc

Hey @NaomiEisen, awesome work here! This is a big one though, 4 new packages plus the core refactor is a lot to review together. Can you split it into multiple, more focused PRs? Thanks!

NaomiEisen · 2026-03-02T18:59:09Z

Hey @NaomiEisen, awesome work here! This is a big one though, 4 new packages plus the core refactor is a lot to review together. Can you split it into multiple, more focused PRs? Thanks!

Thanks for your response, and apologies for the inconvenience 🙏
I'm not sure how to split this into separate PRs, since the changes and new packages are closely tied together and I developed and tested them as a single unit to preserve the existing logic while introducing the new abstraction.

I'm concerned that splitting it might result in intermediate PRs that are not fully functional or don't make sense from a design perspective. Also, if I understood correctly, we'd like to get these changes ASAP, and I'm afraid that breaking them up might slow down the process (at least from my side).

That said, I'm happy to adjust if you have suggestions :)

vMaroon

Overall great work - the instinct and direction are right.

Before we think about splitting into multiple PRs, I think we should first tighten the abstractions. Seeing the actual code makes it clearer than the design doc did - the interface surface is wider than what the callers actually need. I left some comments, but the common thread is to design each interface from the caller's perspective, and let adapters own their internals privately.

Practically this means to focus on the essence of the work you made: the contracts between the pool, engine adapters and subscribers.

vMaroon · 2026-03-02T19:56:52Z

pkg/kvevents/decoder/decoder.go

+limitations under the License.
+*/
+
+package decoder


This package wraps a single msgpack.Unmarshal call, and vllm_adapter.go still calls msgpack.Unmarshal directly in the converters. I think we can drop this package and have serialization remain an internal detail of each adapter.

vMaroon · 2026-03-02T20:05:23Z

pkg/kvevents/engineadapter/adapter.go

+// EngineAdapter defines the interface for engine-specific adapters.
+// Each inference engine has its own adapter implementation that handles
+// engine-specific operations.
+type EngineAdapter interface {


Transport() and Decode() are not used outside the adapter. I think similarly to the decoder, we can maybe collapse the transport into the adapter until the separation is needed.

vMaroon · 2026-03-02T20:05:55Z

pkg/kvevents/engineadapter/adapter.go

+	DecodeMessageToEventBatch(msg *RawMessage) (*events.EventBatch, error)
+
+	// Connect establishes a connection to a remote endpoint.
+	Connect(ctx context.Context, endpoint string) error


Can we combine Connect, Bind and SubscribeToTopic into Setup(ctx, endpoint, topicFilter, remote bool) error?

vMaroon · 2026-03-02T20:07:18Z

pkg/kvevents/events/event.go

+	Type() EventType
+
+	// Process processes the event and updates the index.
+	Process(ctx context.Context, index kvblock.Index, tokenProcessor kvblock.TokenProcessor,


Having events know about kvblock.Index and TokenProcessor couples data to infra. What if a future engine needs different processing logic? Consider keeping events as pure data and letting the pool own the processing as it previously did.

The processing switch-case in pool.digestEvents was one function that kept the indexing coupling in one place - that's a tighter contract than distributing it across every event type.

vMaroon · 2026-03-02T20:18:44Z

pkg/kvevents/transport/transport.go

+
+// Transport defines the interface for receiving raw bytes from different
+// transport protocols (ZMQ, HTTP, gRPC, etc.).
+type Transport interface {


This looks like a ZMQ wrapper and is not used by the subscriber - which strengthens the point of collapsing it into the adapter.

vMaroon · 2026-03-02T20:22:46Z

pkg/kvevents/engineadapter/adapter.go

+	// Payload is the raw msgpack-encoded event batch bytes, not yet decoded.
+	Payload []byte
+	// Adapter is the engine adapter that can decode this payload.
+	Adapter EngineAdapter


I assume that this was added here to let the pool manage the decoding of the payload? A pool already has a reference to a single adapter type, we can eject this circular dependency.

NaomiEisen requested review from dannyharnik, kfirtoledo and vMaroon as code owners February 25, 2026 00:10

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay February 25, 2026 00:10

NaomiEisen commented Feb 25, 2026

View reviewed changes

NaomiEisen marked this pull request as draft February 25, 2026 00:50

Refactor kvevents: Introduce decoder, transport and engine adapter ab…

6a6b5b9

…stractions for multi-engine support. Modify Pool and Subscribers to use new layers.

NaomiEisen force-pushed the kvevents-abstraction branch from 6022335 to 6a6b5b9 Compare March 2, 2026 11:32

NaomiEisen marked this pull request as ready for review March 2, 2026 11:38

fix lint

e1f069c

sagearc suggested changes Mar 2, 2026

View reviewed changes

vMaroon requested changes Mar 2, 2026

View reviewed changes

Conversation

NaomiEisen commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Changes

Testing

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sagearc left a comment

Choose a reason for hiding this comment

Uh oh!

NaomiEisen commented Mar 2, 2026

Uh oh!

vMaroon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NaomiEisen commented Feb 25, 2026 •

edited

Loading