-
-
Notifications
You must be signed in to change notification settings - Fork 306
Open
Description
Summary
mlx_vlm.server still crashes with HTTP 500 on some GLM-OCR region crops due to strict UTF-8 decoding in BPEStreamingDetokenizer.add_token().
Error:
{"detail":"Generation failed: 'utf-8' codec can't decode byte 0xd9 in position 1: invalid continuation byte"}
Environment
- mlx-vlm: 0.4.0
- Python: 3.14.2
- Platform: macOS arm64
- Server command:
mlx_vlm.server --trust-remote-code --port 8099 --model mlx-community/GLM-OCR-bf16Minimal repro
This reproduces a failing crop from page 1 of the GLM-OCR technical report:
import base64, requests
from io import BytesIO
import pypdfium2 as pdfium
pdf = "./2603.10910.pdf" # https://arxiv.org/pdf/2603.10910
page_idx = 0
bbox = [371, 194, 626, 227] # normalized 0-1000
# render + crop
p = pdfium.PdfDocument(pdf)
img = p[page_idx].render(scale=220/72).to_pil()
p.close()
w, h = img.size
x1, y1, x2, y2 = [int(bbox[0]*w/1000), int(bbox[1]*h/1000), int(bbox[2]*w/1000), int(bbox[3]*h/1000)]
crop = img.crop((x1, y1, x2, y2))
buf = BytesIO()
crop.save(buf, format="PNG")
data_url = "data:image/png;base64," + base64.b64encode(buf.getvalue()).decode()
payload = {
"model": "mlx-community/GLM-OCR-bf16",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": data_url}},
{"type": "text", "text": "Recognize the text in the image and output in Markdown format."}
]
}],
"max_tokens": 4096,
"temperature": 0.01,
"top_p": 0.00001,
"top_k": 1,
"repetition_penalty": 1.1,
}
r = requests.post("http://127.0.0.1:8099/v1/chat/completions", json=payload, timeout=180)
print(r.status_code)
print(r.text)Observed:
- status:
500 - body includes UTF-8 decode exception
Suspected source
mlx_vlm/tokenizer_utils.py in BPEStreamingDetokenizer.add_token() decodes without an error handler:
.decode("utf-8")while finalize() already uses tolerant decoding (errors="ignore").
Request
Could this be fixed by making add_token() tolerant as well (e.g. errors="replace" or errors="ignore"), so generation does not crash on these byte sequences?
This is currently a blocker for local GLM-OCR layout-region workflows.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels