Skip to content

uds tokenizer using vllm wrapper#280

Open
delavet wants to merge 3 commits intollm-d:mainfrom
delavet:community/uds-tokenizer-using-vllm-wrapper
Open

uds tokenizer using vllm wrapper#280
delavet wants to merge 3 commits intollm-d:mainfrom
delavet:community/uds-tokenizer-using-vllm-wrapper

Conversation

@delavet
Copy link
Collaborator

@delavet delavet commented Feb 3, 2026

fix #126

This PR updates the way the disaggregated uds tokenizer service works. Now, the service completes ApplyChatTemplate and Encode by using the vllm wrapper.

@hyeongyun0916
Copy link
Collaborator

Regarding the current tasks, I’m working on PR #278 to address Issue #275.

How about we wait until PR #278 is merged before starting with the vLLM wrapper? I believe this sequence would be more efficient and help us avoid redundant work.

Also, I have a quick question: Will the 'uds tokenizer' be separate from the 'vLLM render' mentioned in this comment? Or are they expected to be the same thing? Thanks!

@delavet
Copy link
Collaborator Author

delavet commented Feb 3, 2026

Regarding the current tasks, I’m working on PR #278 to address Issue #275.

How about we wait until PR #278 is merged before starting with the vLLM wrapper? I believe this sequence would be more efficient and help us avoid redundant work.

Also, I have a quick question: Will the 'uds tokenizer' be separate from the 'vLLM render' mentioned in this comment? Or are they expected to be the same thing? Thanks!

Sounds good! I think we can hold this PR for now, and I'll try to rebase and reintegrate it after #278 is merged.

From a functional perspective, the UDS tokenizer should perform tasks similar to the vLLM renderer. I believe that the final reasonable architecture would be to migrate the UDS tokenizer into the vLLM renderer or at least reuse the renderer's code (depending on the final implementation) once all vLLM renderer-related matters are settled. (Want to confirm this with @vMaroon )

At this stage, the UDS tokenizer can at least take on the task of offloading Python dependencies from llm-d-inference-scheduler (considering this comment).

@vMaroon
Copy link
Member

vMaroon commented Feb 9, 2026

#278 was merged

@delavet
Copy link
Collaborator Author

delavet commented Feb 10, 2026

#278 was merged

Thank you! I will rebase and refactor the code this week to get it back in shape for review. Sorry if my responses are delayed in the next few days as we're preparing for Chinese New Year.

@delavet delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from 87269df to 49eadd9 Compare February 14, 2026 09:38
@github-actions
Copy link

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@delavet delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch 2 times, most recently from 5e4cbbe to 2ee9e84 Compare February 14, 2026 10:16
Comment on lines +257 to +271
// Use offset_pairs field in format [start, end, start, end, ...]
var tokenizersOffsets []types.Offset

if len(resp.OffsetPairs) > 0 && len(resp.OffsetPairs)%2 == 0 {
// Use offset_pairs field in format [start, end, start, end, ...]
pairCount := len(resp.OffsetPairs) / 2
tokenizersOffsets = make([]types.Offset, pairCount)
for i := 0; i < pairCount; i++ {
start := resp.OffsetPairs[2*i]
end := resp.OffsetPairs[2*i+1]
tokenizersOffsets[i] = types.Offset{uint(start), uint(end)}
}
} else {
return nil, nil, fmt.Errorf("invalid offset_pairs field in response")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This offset parsing logic is duplicated between Render and RenderChat (line 183+). Can we extract a helper function/method?

Comment on lines +156 to +157
for entry in request.chat_template_kwargs:
chat_template_kwargs[entry.key] = self._protobuf_value_to_python(entry.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a python expert but shouldn't we use request.chat_template_kwargs.items(): ?

Comment on lines +67 to +68
for entry in value.struct_value.fields:
result[entry.key] = self._protobuf_value_to_python(value.struct_value.fields[entry.key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question on the use of .items()

# logging.info(f"Received tokenize request for model: {request.model_name}")
# Get tokenizer key from model name mapping
model_name = request.model_name
if model_name not in self.model_to_key_map:
Copy link
Member

@pierDipi pierDipi Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same disclaimer as above) is accessing this "global" variable safe for concurrent access (given also newer disable GIL features and non-guarantees)?

Makefile Outdated
Comment on lines +40 to +41
BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null)
PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config
BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null)
PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the additional indentation?

ModelName: u.model,
EnableThinking: false, // Can be made configurable later
AddGenerationPrompt: true, // Can be made configurable later
IsLocal: false, // Use configured value, default to true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "default to true" but the value is false. The proto definition comment also says "default: true". This contradiction is confusing.

@delavet delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from 2ee9e84 to e2a2a4a Compare February 25, 2026 01:34
@delavet delavet force-pushed the community/uds-tokenizer-using-vllm-wrapper branch from e2a2a4a to 2e3881a Compare February 25, 2026 10:15
@delavet
Copy link
Collaborator Author

delavet commented Feb 26, 2026

It seems that the CI failed due to #358, and I am currently investigating the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disaggregated Tokenization (Tokenization + Chat-Templating)

4 participants