20260322 -- Diagnostic report

# Diagnostics Report — 8 failure(s), 8 harness issue(s) (mlx-vlm 0.4.1)

(This issue is truncated, for the full report, see [https://github.com/jrp2014/check_models/blob/main/src/output/diagnostics.md](url))

## Summary

Automated benchmarking of **51 locally-cached VLM models** found **8 hard failure(s)** and **8 harness/integration issue(s)** plus **1 preflight compatibility warning(s)** in successful models. 43 of 51 models succeeded.

Test image: `20260321-182222_DSC09486_DxO.jpg` (33.8 MB).

---

## Action Summary

Quick triage list with likely owner and next action for each issue class.

- **[Medium] [transformers]** Failed to process inputs with error: can only concatenate str (not "NoneType") to str (1 model(s)). Next: verify API compatibility and pinned version floor.
- **[Medium] [mlx-vlm]** 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte (1 model(s)). Next: check processor/chat-template wiring and generation kwargs.
- **[Medium] [mlx-vlm]** 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte (1 model(s)). Next: check processor/chat-template wiring and generation kwargs.
- **[Medium] [transformers]** Failed to process inputs with error: Only returning PyTorch tensors is currently supp... (1 model(s)). Next: verify API compatibility and pinned version floor.
- **[Medium] [transformers]** Failed to process inputs with error: Only returning PyTorch tensors is currently supp... (1 model(s)). Next: verify API compatibility and pinned version floor.
- **[Medium] [transformers]** Failed to process inputs with error: Only returning PyTorch tensors is currently supp... (1 model(s)). Next: verify API compatibility and pinned version floor.
- **[Medium] [transformers]** Failed to process inputs with error: Only returning PyTorch tensors is currently supp... (1 model(s)). Next: verify API compatibility and pinned version floor.
- **[Medium] [model configuration/repository]** Loaded processor has no image_processor; expected multimodal processor. (1 model(s)). Next: verify model config, tokenizer files, and revision alignment.
- **[Medium] [mlx-vlm]** Harness/integration warnings on 4 model(s). Next: check processor/chat-template wiring and generation kwargs.
- **[Medium] [mlx-vlm / mlx]** Harness/integration warnings on 2 model(s). Next: validate long-context handling and stop-token behavior across mlx-vlm + mlx runtime.
- **[Medium] [model-config / mlx-vlm]** Harness/integration warnings on 2 model(s). Next: validate chat-template/config expectations and mlx-vlm prompt formatting for this model.
- **[Medium] [transformers / mlx-vlm]** Stack-signal anomalies on 1 successful model(s). Next: verify API compatibility and pinned version floor.
- **[Medium] [transformers]** Preflight compatibility warnings (1 issue(s)). Next: verify API compatibility and pinned version floor.

---

## Priority Summary

| Priority | Issue | Models Affected | Owner | Next Action |
| -------- | ----- | --------------- | ----- | ----------- |
| **Medium** | Failed to process inputs with error: can only concatenate str (not "N... | 1 (Florence-2-large-ft) | `transformers` | verify API compatibility and pinned version floor. |
| **Medium** | 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte | 1 (InternVL3-8B-bf16) | `mlx-vlm` | check processor/chat-template wiring and generation kwargs. |
| **Medium** | 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte | 1 (Molmo-7B-D-0924-bf16) | `mlx-vlm` | check processor/chat-template wiring and generation kwargs. |
| **Medium** | Failed to process inputs with error: Only returning PyTorch tensors i... | 1 (Qwen3.5-27B-4bit) | `transformers` | verify API compatibility and pinned version floor. |
| **Medium** | Failed to process inputs with error: Only returning PyTorch tensors i... | 1 (Qwen3.5-27B-mxfp8) | `transformers` | verify API compatibility and pinned version floor. |
| **Medium** | Failed to process inputs with error: Only returning PyTorch tensors i... | 1 (Qwen3.5-35B-A3B-6bit) | `transformers` | verify API compatibility and pinned version floor. |
| **Medium** | Failed to process inputs with error: Only returning PyTorch tensors i... | 1 (Qwen3.5-35B-A3B-bf16) | `transformers` | verify API compatibility and pinned version floor. |
| **Medium** | Loaded processor has no image_processor; expected multimodal processor. | 1 (deepseek-vl2-8bit) | `model configuration/repository` | verify model config, tokenizer files, and revision alignment. |
| **Medium** | Harness/integration | 4 (Phi-3.5-vision-instruct, Devstral-Small-2-24B-Instruct-2512-5bit, ERNIE-4.5-VL-28B-A3B-Thinking-bf16, Florence-2-large-ft) | `mlx-vlm` | check processor/chat-template wiring and generation kwargs. |
| **Medium** | Harness/integration | 2 (Qwen3-VL-2B-Thinking-bf16, X-Reasoner-7B-8bit) | `mlx-vlm / mlx` | validate long-context handling and stop-token behavior across mlx-vlm + mlx runtime. |
| **Medium** | Harness/integration | 2 (Qwen2-VL-2B-Instruct-4bit, paligemma2-10b-ft-docci-448-bf16) | `model-config / mlx-vlm` | validate chat-template/config expectations and mlx-vlm prompt formatting for this model. |
| **Medium** | Stack-signal anomaly | 1 (Qwen3-VL-2B-Instruct) | `transformers / mlx-vlm` | verify API compatibility and pinned version floor. |
| **Medium** | Preflight compatibility warning | 1 issue(s) | `transformers` | verify API compatibility and pinned version floor. |

---

## 1. Failure affecting 1 model (Priority: Medium)

**Observed behavior:** Failed to process inputs with error: can only concatenate str (not "NoneType") to str
**Owner (likely component):** `transformers`
**Suggested next action:** verify API compatibility and pinned version floor.
**Affected model:** `microsoft/Florence-2-large-ft`

| Model | Observed Behavior | First Seen Failing | Recent Repro |
| ----- | ----------------- | ------------------ | ------------ |
| `microsoft/Florence-2-large-ft` | Failed to process inputs with error: can only concatenate str (not "NoneType") to str | 2026-02-07 20:59:01 GMT | 3/3 recent runs failed |

### To reproduce

- Repro command (exact run): `python -m check_models --image /Users/jrp/Pictures/Processed/20260321-182222_DSC09486_DxO.jpg --trust-remote-code --max-tokens 500 --temperature 0.0 --top-p 1.0 --repetition-context-size 20 --prefill-step-size 4096 --timeout 300.0 --verbose --models microsoft/Florence-2-large-ft`

<details>
<summary>Detailed trace logs (affected model)</summary>

#### `microsoft/Florence-2-large-ft`

Traceback:

```text
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1019, in process_inputs_with_fallback
    return process_inputs(
        processor,
    ...<5 lines>...
        **kwargs,
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1005, in process_inputs
    return process_method(**args)
  File "/Users/jrp/miniconda3/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/florence2/processing_florence2.py", line 185, in __call__
    self.image_token * self.num_image_tokens
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + self.tokenizer.bos_token
    ^~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 694, in generate
    for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
                    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 537, in stream_generate
    inputs = prepare_inputs(
        processor,
    ...<6 lines>...
        **kwargs,
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1237, in prepare_inputs
    inputs = process_inputs_with_fallback(
        processor,
    ...<4 lines>...
        **kwargs,
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1029, in process_inputs_with_fallback
    raise ValueError(f"Failed to process inputs with error: {e}")
ValueError: Failed to process inputs with error: can only concatenate str (not "NoneType") to str

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
ValueError: Model generation failed for microsoft/Florence-2-large-ft: Failed to process inputs with error: can only concatenate str (not "NoneType") to str
```

Captured stdout/stderr:

```text
=== STDOUT ===
==========
Files: ['/', 'U', 's', 'e', 'r', 's', '/', 'j', 'r', 'p', '/', 'P', 'i', 'c', 't', 'u', 'r', 'e', 's', '/', 'P', 'r', 'o', 'c', 'e', 's', 's', 'e', 'd', '/', '2', '0', '2', '6', '0', '3', '2', '1', '-', '1', '8', '2', '2', '2', '2', '_', 'D', 'S', 'C', '0', '9', '4', '8', '6', '_', 'D', 'x', 'O', '.', 'j', 'p', 'g'] 

Prompt: Analyze this image for cataloguing metadata, using British English.

Use only details that are clearly and definitely visible in the image. If a detail is uncertain, ambiguous, partially obscured, too small to verify, or not directly visible, leave it out. Do not guess.

Treat the metadata hints below as a draft catalog record. Keep only details that are clearly confirmed by the image, correct anything contradicted by the image, and add important visible details that are definitely present.

Return exactly these three sections, and nothing else:

Title:
- 5-10 words, concrete and factual, limited to clearly visible content.
- Output only the title text after the label.
- Do not repeat or paraphrase these instructions in the title.

Description:
- 1-2 factual sentences describing the main visible subject, setting, lighting, action, and other distinctive visible details. Omit anything uncertain or inferred.
- Output only the description text after the label.

Keywords:
- 10-18 unique comma-separated terms based only on clearly visible subjects, setting, colors, composition, and style. Omit uncertain tags rather than guessing.
- Output only the keyword list after the label.

Rules:
- Include only details that are definitely visible in the image.
- Reuse metadata terms only when they are clearly supported by the image.
- If metadata and image disagree, follow the image.
- Prefer omission to speculation.
- Do not copy prompt instructions into the Title, Description, or Keywords fields.
- Do not infer identity, location, event, brand, species, time period, or intent unless visually obvious.
- Do not output reasoning, notes, hedging, or extra sections.

Context: Existing metadata hints (high confidence; use only when visually confirmed):
- Description hint: Pedestrians cross a footbridge over a canal at dusk in a vibrant urban waterside area. A modern glass building reflects the golden light of the setting sun against a purple twilight sky, while people walk along the towpath, relax on the bank, and socialize at a nearby restaurant. Moored boats line the canal, completing the lively evening scene as people go about their daily lives, commuting or enjoying leisure time.
- Capture metadata: Taken on 2026-03-21 18:22:22 GMT (at 18:22:22 local time). GPS: 51.536500°N, 0.126500°W.

=== STDERR ===
Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]
Fetching 10 files: 100%|##########| 10/10 [00:00<00:00, 38444.58it/s]

Download complete: : 0.00B [00:00, ?B/s]              
Download complete: : 0.00B [00:00, ?B/s]
```

</details>

## 2. Failure affecting 1 model (Priority: Medium)

**Observed behavior:** 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte
**Owner (likely component):** `mlx-vlm`
**Suggested next action:** check processor/chat-template wiring and generation kwargs.
**Affected model:** `mlx-community/InternVL3-8B-bf16`

| Model | Observed Behavior | First Seen Failing | Recent Repro |
| ----- | ----------------- | ------------------ | ------------ |
| `mlx-community/InternVL3-8B-bf16` | 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte | 2026-02-23 12:54:48 GMT | 2/3 recent runs failed |

### To reproduce

- Repro command (exact run): `python -m check_models --image /Users/jrp/Pictures/Processed/20260321-182222_DSC09486_DxO.jpg --trust-remote-code --max-tokens 500 --temperature 0.0 --top-p 1.0 --repetition-context-size 20 --prefill-step-size 4096 --timeout 300.0 --verbose --models mlx-community/InternVL3-8B-bf16`

<details>
<summary>Detailed trace logs (affected model)</summary>

#### `mlx-community/InternVL3-8B-bf16`

Traceback:

```text
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 694, in generate
    for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
                    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 596, in stream_generate
    detokenizer.add_token(token, skip_special_token_ids=skip_special_token_ids)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/tokenizer_utils.py", line 232, in add_token
    ).decode("utf-8")
      ~~~~~~^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
ValueError: Model generation failed for mlx-community/InternVL3-8B-bf16: 'utf-8' codec can't decode byte 0xab in position 10: invalid start byte
```

Captured stdout/stderr:

```text
=== STDOUT ===
==========
Files: ['/', 'U', 's', 'e', 'r', 's', '/', 'j', 'r', 'p', '/', 'P', 'i', 'c', 't', 'u', 'r', 'e', 's', '/', 'P', 'r', 'o', 'c', 'e', 's', 's', 'e', 'd', '/', '2', '0', '2', '6', '0', '3', '2', '1', '-', '1', '8', '2', '2', '2', '2', '_', 'D', 'S', 'C', '0', '9', '4', '8', '6', '_', 'D', 'x', 'O', '.', 'j', 'p', 'g'] 

Prompt: User: <image>
Analyze this image for cataloguing metadata, using British English.

Use only details that are clearly and definitely visible in the image. If a detail is uncertain, ambiguous, partially obscured, too small to verify, or not directly visible, leave it out. Do not guess.

Treat the metadata hints below as a draft catalog record. Keep only details that are clearly confirmed by the image, correct anything contradicted by the image, and add important visible details that are definitely present.

Return exactly these three sections, and nothing else:

Title:
- 5-10 words, concrete and factual, limited to clearly visible content.
- Output only the title text after the label.
- Do not repeat or paraphrase these instructions in the title.

Description:
- 1-2 factual sentences describing the main visible subject, setting, lighting, action, and other distinctive visible details. Omit anything uncertain or inferred.
- Output only the description text after the label.

Keywords:
- 10-18 unique comma-separated terms based only on clearly visible subjects, setting, colors, composition, and style. Omit uncertain tags rather than guessing.
- Output only the keyword list after the label.

Rules:
- Include only details that are definitely visible in the image.
- Reuse metadata terms only when they are clearly supported by the image.
- If metadata and image disagree, follow the image.
- Prefer omission to speculation.
- Do not copy prompt instructions into the Title, Description, or Keywords fields.
- Do not infer identity, location, event, brand, species, time period, or intent unless visually obvious.
- Do not output reasoning, notes, hedging, or extra sections.

Context: Existing metadata hints (high confidence; use only when visually confirmed):
- Description hint: Pedestrians cross a footbridge over a canal at dusk in a vibrant urban waterside area. A modern glass building reflects the golden light of the setting sun against a purple twilight sky, while people walk along the towpath, relax on the bank, and socialize at a nearby restaurant. Moored boats line the canal, completing the lively evening scene as people go about their daily lives, commuting or enjoying leisure time.
- Capture metadata: Taken on 2026-03-21 18:22:22 GMT (at 18:22:22 local time). GPS: 51.536500°N, 0.126500°W.
Assistant:

感 Rencontre pestic Rencontre.ERR Rencontre enthus.ERR Rencontre醍racial pestic

=== STDERR ===
Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 17 files:   0%|          | 0/17 [00:00<?, ?it/s]
Fetching 17 files: 100%|##########| 17/17 [00:00<00:00, 9459.16it/s]

Download complete: : 0.00B [00:00, ?B/s]              
Download complete: : 0.00B [00:00, ?B/s]
```

</details>

## 3. Failure affecting 1 model (Priority: Medium)

**Observed behavior:** 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte
**Owner (likely component):** `mlx-vlm`
**Suggested next action:** check processor/chat-template wiring and generation kwargs.
**Affected model:** `mlx-community/Molmo-7B-D-0924-bf16`

| Model | Observed Behavior | First Seen Failing | Recent Repro |
| ----- | ----------------- | ------------------ | ------------ |
| `mlx-community/Molmo-7B-D-0924-bf16` | 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte | 2026-03-22 01:27:09 GMT | 2/3 recent runs failed |

### To reproduce

- Repro command (exact run): `python -m check_models --image /Users/jrp/Pictures/Processed/20260321-182222_DSC09486_DxO.jpg --trust-remote-code --max-tokens 500 --temperature 0.0 --top-p 1.0 --repetition-context-size 20 --prefill-step-size 4096 --timeout 300.0 --verbose --models mlx-community/Molmo-7B-D-0924-bf16`

<details>
<summary>Detailed trace logs (affected model)</summary>

#### `mlx-community/Molmo-7B-D-0924-bf16`

Traceback:

```text
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 694, in generate
    for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
                    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 596, in stream_generate
    detokenizer.add_token(token, skip_special_token_ids=skip_special_token_ids)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/tokenizer_utils.py", line 232, in add_token
    ).decode("utf-8")
      ~~~~~~^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
ValueError: Model generation failed for mlx-community/Molmo-7B-D-0924-bf16: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte
```

Captured stdout/stderr:

```text
=== STDOUT ===
==========
Files: ['/', 'U', 's', 'e', 'r', 's', '/', 'j', 'r', 'p', '/', 'P', 'i', 'c', 't', 'u', 'r', 'e', 's', '/', 'P', 'r', 'o', 'c', 'e', 's', 's', 'e', 'd', '/', '2', '0', '2', '6', '0', '3', '2', '1', '-', '1', '8', '2', '2', '2', '2', '_', 'D', 'S', 'C', '0', '9', '4', '8', '6', '_', 'D', 'x', 'O', '.', 'j', 'p', 'g'] 

Prompt: Analyze this image for cataloguing metadata, using British English.

Use only details that are clearly and definitely visible in the image. If a detail is uncertain, ambiguous, partially obscured, too small to verify, or not directly visible, leave it out. Do not guess.

Treat the metadata hints below as a draft catalog record. Keep only details that are clearly confirmed by the image, correct anything contradicted by the image, and add important visible details that are definitely present.

Return exactly these three sections, and nothing else:

Title:
- 5-10 words, concrete and factual, limited to clearly visible content.
- Output only the title text after the label.
- Do not repeat or paraphrase these instructions in the title.

Description:
- 1-2 factual sentences describing the main visible subject, setting, lighting, action, and other distinctive visible details. Omit anything uncertain or inferred.
- Output only the description text after the label.

Keywords:
- 10-18 unique comma-separated terms based only on clearly visible subjects, setting, colors, composition, and style. Omit uncertain tags rather than guessing.
- Output only the keyword list after the label.

Rules:
- Include only details that are definitely visible in the image.
- Reuse metadata terms only when they are clearly supported by the image.
- If metadata and image disagree, follow the image.
- Prefer omission to speculation.
- Do not copy prompt instructions into the Title, Description, or Keywords fields.
- Do not infer identity, location, event, brand, species, time period, or intent unless visually obvious.
- Do not output reasoning, notes, hedging, or extra sections.

Context: Existing metadata hints (high confidence; use only when visually confirmed):
- Description hint: Pedestrians cross a footbridge over a canal at dusk in a vibrant urban waterside area. A modern glass building reflects the golden light of the setting sun against a purple twilight sky, while people walk along the towpath, relax on the bank, and socialize at a nearby restaurant. Moored boats line the canal, completing the lively evening scene as people go about their daily lives, commuting or enjoying leisure time.
- Capture metadata: Taken on 2026-03-21 18:22:22 GMT (at 18:22:22 local time). GPS: 51.536500°N, 0.126500°W.

=== STDERR ===
Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 18 files:   0%|          | 0/18 [00:00<?, ?it/s]
Fetching 18 files: 100%|##########| 18/18 [00:00<00:00, 11023.14it/s]

Download complete: : 0.00B [00:00, ?B/s]              
Download complete: : 0.00B [00:00, ?B/s]
```

</details>

## 4. Failure affecting 1 model (Priority: Medium)

**Observed behavior:** Failed to process inputs with error: Only returning PyTorch tensors is currently supported.
**Owner (likely component):** `transformers`
**Suggested next action:** verify API compatibility and pinned version floor.
**Affected model:** `mlx-community/Qwen3.5-27B-4bit`

| Model | Observed Behavior | First Seen Failing | Recent Repro |
| ----- | ----------------- | ------------------ | ------------ |
| `mlx-community/Qwen3.5-27B-4bit` | Failed to process inputs with error: Only returning PyTorch tensors is currently supported. | 2026-03-22 01:27:09 GMT | 3/3 recent runs failed |

### To reproduce

- Repro command (exact run): `python -m check_models --image /Users/jrp/Pictures/Processed/20260321-182222_DSC09486_DxO.jpg --trust-remote-code --max-tokens 500 --temperature 0.0 --top-p 1.0 --repetition-context-size 20 --prefill-step-size 4096 --timeout 300.0 --verbose --models mlx-community/Qwen3.5-27B-4bit`

<details>
<summary>Detailed trace logs (affected model)</summary>

#### `mlx-community/Qwen3.5-27B-4bit`

Traceback:

```text
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1019, in process_inputs_with_fallback
    return process_inputs(
        processor,
    ...<5 lines>...
        **kwargs,
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 1005, in process_inputs

```
... Etc.  Further details at  https://github.com/jrp2014/check_models/blob/main/src/output/diagnostics.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

20260322 -- Diagnostic report #852

Diagnostics Report — 8 failure(s), 8 harness issue(s) (mlx-vlm 0.4.1)

Summary

Action Summary

Priority Summary

1. Failure affecting 1 model (Priority: Medium)

To reproduce

`microsoft/Florence-2-large-ft`

2. Failure affecting 1 model (Priority: Medium)

To reproduce

`mlx-community/InternVL3-8B-bf16`

3. Failure affecting 1 model (Priority: Medium)

To reproduce

`mlx-community/Molmo-7B-D-0924-bf16`

4. Failure affecting 1 model (Priority: Medium)

To reproduce

`mlx-community/Qwen3.5-27B-4bit`

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Priority	Issue	Models Affected	Owner	Next Action
Medium	Failed to process inputs with error: can only concatenate str (not "N...	1 (Florence-2-large-ft)	`transformers`	verify API compatibility and pinned version floor.
Medium	'utf-8' codec can't decode byte 0xab in position 10: invalid start byte	1 (InternVL3-8B-bf16)	`mlx-vlm`	check processor/chat-template wiring and generation kwargs.
Medium	'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte	1 (Molmo-7B-D-0924-bf16)	`mlx-vlm`	check processor/chat-template wiring and generation kwargs.
Medium	Failed to process inputs with error: Only returning PyTorch tensors i...	1 (Qwen3.5-27B-4bit)	`transformers`	verify API compatibility and pinned version floor.
Medium	Failed to process inputs with error: Only returning PyTorch tensors i...	1 (Qwen3.5-27B-mxfp8)	`transformers`	verify API compatibility and pinned version floor.
Medium	Failed to process inputs with error: Only returning PyTorch tensors i...	1 (Qwen3.5-35B-A3B-6bit)	`transformers`	verify API compatibility and pinned version floor.
Medium	Failed to process inputs with error: Only returning PyTorch tensors i...	1 (Qwen3.5-35B-A3B-bf16)	`transformers`	verify API compatibility and pinned version floor.
Medium	Loaded processor has no image_processor; expected multimodal processor.	1 (deepseek-vl2-8bit)	`model configuration/repository`	verify model config, tokenizer files, and revision alignment.
Medium	Harness/integration	4 (Phi-3.5-vision-instruct, Devstral-Small-2-24B-Instruct-2512-5bit, ERNIE-4.5-VL-28B-A3B-Thinking-bf16, Florence-2-large-ft)	`mlx-vlm`	check processor/chat-template wiring and generation kwargs.
Medium	Harness/integration	2 (Qwen3-VL-2B-Thinking-bf16, X-Reasoner-7B-8bit)	`mlx-vlm / mlx`	validate long-context handling and stop-token behavior across mlx-vlm + mlx runtime.
Medium	Harness/integration	2 (Qwen2-VL-2B-Instruct-4bit, paligemma2-10b-ft-docci-448-bf16)	`model-config / mlx-vlm`	validate chat-template/config expectations and mlx-vlm prompt formatting for this model.
Medium	Stack-signal anomaly	1 (Qwen3-VL-2B-Instruct)	`transformers / mlx-vlm`	verify API compatibility and pinned version floor.
Medium	Preflight compatibility warning	1 issue(s)	`transformers`	verify API compatibility and pinned version floor.

Uh oh!

20260322 -- Diagnostic report #852

Description

Diagnostics Report — 8 failure(s), 8 harness issue(s) (mlx-vlm 0.4.1)

Summary

Action Summary

Priority Summary

1. Failure affecting 1 model (Priority: Medium)

To reproduce

microsoft/Florence-2-large-ft

2. Failure affecting 1 model (Priority: Medium)

To reproduce

mlx-community/InternVL3-8B-bf16

3. Failure affecting 1 model (Priority: Medium)

To reproduce

mlx-community/Molmo-7B-D-0924-bf16

4. Failure affecting 1 model (Priority: Medium)

To reproduce

mlx-community/Qwen3.5-27B-4bit

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`microsoft/Florence-2-large-ft`

`mlx-community/InternVL3-8B-bf16`

`mlx-community/Molmo-7B-D-0924-bf16`

`mlx-community/Qwen3.5-27B-4bit`