Conversation
|
Regarding the current tasks, I’m working on PR #278 to address Issue #275. How about we wait until PR #278 is merged before starting with the vLLM wrapper? I believe this sequence would be more efficient and help us avoid redundant work. Also, I have a quick question: Will the 'uds tokenizer' be separate from the 'vLLM render' mentioned in this comment? Or are they expected to be the same thing? Thanks! |
Sounds good! I think we can hold this PR for now, and I'll try to rebase and reintegrate it after #278 is merged. From a functional perspective, the UDS tokenizer should perform tasks similar to the vLLM renderer. I believe that the final reasonable architecture would be to migrate the UDS tokenizer into the vLLM renderer or at least reuse the renderer's code (depending on the final implementation) once all vLLM renderer-related matters are settled. (Want to confirm this with @vMaroon ) At this stage, the UDS tokenizer can at least take on the task of offloading Python dependencies from |
|
#278 was merged |
Thank you! I will rebase and refactor the code this week to get it back in shape for review. Sorry if my responses are delayed in the next few days as we're preparing for Chinese New Year. |
87269df to
49eadd9
Compare
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
5e4cbbe to
2ee9e84
Compare
pkg/tokenization/uds_tokenizer.go
Outdated
| // Use offset_pairs field in format [start, end, start, end, ...] | ||
| var tokenizersOffsets []types.Offset | ||
|
|
||
| if len(resp.OffsetPairs) > 0 && len(resp.OffsetPairs)%2 == 0 { | ||
| // Use offset_pairs field in format [start, end, start, end, ...] | ||
| pairCount := len(resp.OffsetPairs) / 2 | ||
| tokenizersOffsets = make([]types.Offset, pairCount) | ||
| for i := 0; i < pairCount; i++ { | ||
| start := resp.OffsetPairs[2*i] | ||
| end := resp.OffsetPairs[2*i+1] | ||
| tokenizersOffsets[i] = types.Offset{uint(start), uint(end)} | ||
| } | ||
| } else { | ||
| return nil, nil, fmt.Errorf("invalid offset_pairs field in response") | ||
| } |
There was a problem hiding this comment.
This offset parsing logic is duplicated between Render and RenderChat (line 183+). Can we extract a helper function/method?
| for entry in request.chat_template_kwargs: | ||
| chat_template_kwargs[entry.key] = self._protobuf_value_to_python(entry.value) |
There was a problem hiding this comment.
Not a python expert but shouldn't we use request.chat_template_kwargs.items(): ?
| for entry in value.struct_value.fields: | ||
| result[entry.key] = self._protobuf_value_to_python(value.struct_value.fields[entry.key]) |
There was a problem hiding this comment.
Same question on the use of .items()
| # logging.info(f"Received tokenize request for model: {request.model_name}") | ||
| # Get tokenizer key from model name mapping | ||
| model_name = request.model_name | ||
| if model_name not in self.model_to_key_map: |
There was a problem hiding this comment.
(same disclaimer as above) is accessing this "global" variable safe for concurrent access (given also newer disable GIL features and non-guarantees)?
Makefile
Outdated
| BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null) | ||
| PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config | ||
| BREW_PREFIX := $(shell command -v brew >/dev/null 2>&1 && brew --prefix python@$(PYTHON_VERSION) 2>/dev/null) | ||
| PYTHON_CONFIG := $(BREW_PREFIX)/bin/python$(PYTHON_VERSION)-config |
There was a problem hiding this comment.
do we need the additional indentation?
pkg/tokenization/uds_tokenizer.go
Outdated
| ModelName: u.model, | ||
| EnableThinking: false, // Can be made configurable later | ||
| AddGenerationPrompt: true, // Can be made configurable later | ||
| IsLocal: false, // Use configured value, default to true |
There was a problem hiding this comment.
The comment says "default to true" but the value is false. The proto definition comment also says "default: true". This contradiction is confusing.
2ee9e84 to
e2a2a4a
Compare
e2a2a4a to
2e3881a
Compare
|
It seems that the CI failed due to #358, and I am currently investigating the cause. |
fix #126
This PR updates the way the disaggregated uds tokenizer service works. Now, the service completes
ApplyChatTemplateandEncodeby using the vllm wrapper.