Benchmarking MPS vs MLX encoding speed using SigLIP

````
Aggregated totals by architecture and backend:
                    architecture backend  frame_count  duration_s    avg_fps
0           ViT-B-16-SigLIP2-256     MPS         1637       28.21  58.029068
1            ViT-L-16-SigLIP-256     MPS         1637       91.39  17.912244
2            ViT-L-16-SigLIP-384     MPS         1637      226.56   7.225459
3       ViT-SO400M-14-SigLIP-384     MPS         1637      483.48   3.385869
4       siglip-large-patch16-384     MLX         1637      180.88   9.050199
5  siglip-large-patch16-384-4bit     MLX         1637      208.80   7.840038
6      siglip-so400m-patch14-224     MLX         1637      106.37  15.389678
7      siglip-so400m-patch14-384     MLX         1637      341.62   4.791874
````

![Image](https://github.com/user-attachments/assets/c6c0bde1-a0ad-4a39-8332-c6d7a64fc8cb)


Was expecting a more significant speedup, perhaps I'm missing something? I'm taking extracted frames from videos and encoding them with batch size 32, using an MBP M1 Pro 16GB. Here's some snippets of my code:
```
 def process_batch(batch: torch.Tensor) -> torch.Tensor:
    # batch has shape [B, 3, H, W] on CPU (by default).
    if self.is_mlx_model:
        mx_in = self.mx.array(batch)
        dtype = (
            self.mlx_model.vision_model.vision_model.embeddings.patch_embedding.weight.dtype
        )
        mx_in = mx_in.transpose(0, 2, 3, 1).astype(dtype)
        return torch.from_numpy(np.array(self.mlx_model.get_image_features(pixel_values=mx_in, return_dict=False, output_attentions=False), copy=False))
```
I also avoid passing the pre-allocated tensors between CPU and MPS when running the MLX pipeline, where otherwise I would allocate them on CPU for the frame extraction and move the whole thing over to MPS when it's done in order to start computing embeddings. Maybe (hopefully) I'm missing something? Will be interesting to see how SigLIP2 performs on MLX regardless, as you can see I included `ViT-B-16-SigLIP2-256` and it's by far the fastest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking MPS vs MLX encoding speed using SigLIP #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmarking MPS vs MLX encoding speed using SigLIP #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions