uds tokenizer using vllm wrapper by delavet · Pull Request #280 · llm-d/llm-d-kv-cache

delavet · 2026-02-03T02:08:59Z

This PR updates the way the disaggregated uds tokenizer service works. Now, the service completes ApplyChatTemplate and Encode by using the vllm wrapper.

hyeongyun0916 · 2026-02-03T03:12:24Z

Regarding the current tasks, I’m working on PR #278 to address Issue #275.

How about we wait until PR #278 is merged before starting with the vLLM wrapper? I believe this sequence would be more efficient and help us avoid redundant work.

Also, I have a quick question: Will the 'uds tokenizer' be separate from the 'vLLM render' mentioned in this comment? Or are they expected to be the same thing? Thanks!

delavet · 2026-02-03T03:31:18Z

Regarding the current tasks, I’m working on PR #278 to address Issue #275.

How about we wait until PR #278 is merged before starting with the vLLM wrapper? I believe this sequence would be more efficient and help us avoid redundant work.

Also, I have a quick question: Will the 'uds tokenizer' be separate from the 'vLLM render' mentioned in this comment? Or are they expected to be the same thing? Thanks!

Sounds good! I think we can hold this PR for now, and I'll try to rebase and reintegrate it after #278 is merged.

From a functional perspective, the UDS tokenizer should perform tasks similar to the vLLM renderer. I believe that the final reasonable architecture would be to migrate the UDS tokenizer into the vLLM renderer or at least reuse the renderer's code (depending on the final implementation) once all vLLM renderer-related matters are settled. (Want to confirm this with @vMaroon )

At this stage, the UDS tokenizer can at least take on the task of offloading Python dependencies from llm-d-inference-scheduler (considering this comment).

vMaroon · 2026-02-09T08:37:19Z

#278 was merged

delavet · 2026-02-10T02:17:43Z

#278 was merged

Thank you! I will rebase and refactor the code this week to get it back in shape for review. Sorry if my responses are delayed in the next few days as we're preparing for Chinese New Year.

github-actions · 2026-02-14T09:38:44Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

pierDipi · 2026-02-17T10:32:21Z

pkg/tokenization/uds_tokenizer.go

+	// Use offset_pairs field in format [start, end, start, end, ...]
+	var tokenizersOffsets []types.Offset
+
+	if len(resp.OffsetPairs) > 0 && len(resp.OffsetPairs)%2 == 0 {
+		// Use offset_pairs field in format [start, end, start, end, ...]
+		pairCount := len(resp.OffsetPairs) / 2
+		tokenizersOffsets = make([]types.Offset, pairCount)
+		for i := 0; i < pairCount; i++ {
+			start := resp.OffsetPairs[2*i]
+			end := resp.OffsetPairs[2*i+1]
+			tokenizersOffsets[i] = types.Offset{uint(start), uint(end)}
+		}
+	} else {
+		return nil, nil, fmt.Errorf("invalid offset_pairs field in response")
+	}


This offset parsing logic is duplicated between Render and RenderChat (line 183+). Can we extract a helper function/method?

pierDipi · 2026-02-17T10:36:19Z

services/uds_tokenizer/tokenizer_grpc_service.py

+            for entry in request.chat_template_kwargs:
+                chat_template_kwargs[entry.key] = self._protobuf_value_to_python(entry.value)


Not a python expert but shouldn't we use request.chat_template_kwargs.items(): ?

pierDipi · 2026-02-17T10:36:51Z

services/uds_tokenizer/tokenizer_grpc_service.py

+            for entry in value.struct_value.fields:
+                result[entry.key] = self._protobuf_value_to_python(value.struct_value.fields[entry.key])


Same question on the use of .items()

pierDipi · 2026-02-17T10:44:17Z

services/uds_tokenizer/tokenizer_grpc_service.py

-            # logging.info(f"Received tokenize request for model: {request.model_name}")
+            # Get tokenizer key from model name mapping
+            model_name = request.model_name
+            if model_name not in self.model_to_key_map:


(same disclaimer as above) is accessing this "global" variable safe for concurrent access (given also newer disable GIL features and non-guarantees)?

pierDipi · 2026-02-17T10:47:29Z

Makefile

-    BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null)
-    PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config
+        BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null)
+        PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config


do we need the additional indentation?

pierDipi · 2026-02-17T10:50:06Z

pkg/tokenization/uds_tokenizer.go

-		ModelName:           u.model,
-		EnableThinking:      false, // Can be made configurable later
-		AddGenerationPrompt: true,  // Can be made configurable later
+		IsLocal:     false, // Use configured value, default to true


The comment says "default to true" but the value is false. The proto definition comment also says "default: true". This contradiction is confusing.

delavet · 2026-02-26T09:20:07Z

It seems that the CI failed due to #358, and I am currently investigating the cause.

delavet requested review from dannyharnik, kfirtoledo and vMaroon as code owners February 3, 2026 02:09

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay February 3, 2026 02:09

sagearc mentioned this pull request Feb 9, 2026

[RFC] Remove embedded tokenizers and enforce UDS Tokenizer usage #295

Open

3 tasks

hyeongyun0916 mentioned this pull request Feb 12, 2026

Benchmark UDS vs embedded tokenizers #299

Open

delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from 87269df to 49eadd9 Compare February 14, 2026 09:38

delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch 2 times, most recently from 5e4cbbe to 2ee9e84 Compare February 14, 2026 10:16

pierDipi reviewed Feb 17, 2026

View reviewed changes

delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from 2ee9e84 to e2a2a4a Compare February 25, 2026 01:34

uds tokenizer use vllm wrapper

2e3881a

delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from e2a2a4a to 2e3881a Compare February 25, 2026 10:15

delavet added 2 commits February 25, 2026 18:20

fix typo

46a6ffc

resolve TestInstrumentedIndexBehavior/ConcurrentOperations failure

fc7e7c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uds tokenizer using vllm wrapper#280

uds tokenizer using vllm wrapper#280
delavet wants to merge 3 commits intollm-d:mainfrom
delavet:community/uds-tokenizer-using-vllm-wrapper

delavet commented Feb 3, 2026

Uh oh!

hyeongyun0916 commented Feb 3, 2026

Uh oh!

delavet commented Feb 3, 2026

Uh oh!

vMaroon commented Feb 9, 2026

Uh oh!

delavet commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

pierDipi Feb 17, 2026

Uh oh!

pierDipi Feb 17, 2026

Uh oh!

pierDipi Feb 17, 2026

Uh oh!

pierDipi Feb 17, 2026 •

edited

Loading

Uh oh!

pierDipi Feb 17, 2026

Uh oh!

pierDipi Feb 17, 2026

Uh oh!

delavet commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		for entry in request.chat_template_kwargs:
		chat_template_kwargs[entry.key] = self._protobuf_value_to_python(entry.value)

		for entry in value.struct_value.fields:
		result[entry.key] = self._protobuf_value_to_python(value.struct_value.fields[entry.key])

Conversation

delavet commented Feb 3, 2026

Uh oh!

hyeongyun0916 commented Feb 3, 2026

Uh oh!

delavet commented Feb 3, 2026

Uh oh!

vMaroon commented Feb 9, 2026

Uh oh!

delavet commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

pierDipi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

pierDipi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

pierDipi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

pierDipi Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pierDipi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

pierDipi Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

delavet commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pierDipi Feb 17, 2026 •

edited

Loading