Does oMLX support 4-bit KV Cache quantization? #152

Yif1999 · 2026-03-11T04:48:07Z

Yif1999
Mar 11, 2026

Hey! Really digging the tiered KV cache (SSD offloading) design, it's super handy for long contexts on Mac.

Just wondering—does oMLX support 4-bit KV cache quantization yet? Or is it something on the roadmap? I'm trying to push the context limit as far as possible on a memory-constrained machine.

Any plans for this? Thanks!

Answered by jundot

Mar 11, 2026

As you can see in ml-explore/mlx-lm#941, the continuous batching that oMLX relies on as its core does not support kv cache quantization yet. (that PR is still open and doesn't appear to be proper quantization anyway.)

The first problem is that mlx-lm, oMLX's backend, doesn't support it. But even if i tried to implement it separately, i honestly have a lot of doubts about the effectiveness of kv cache quantization. If you've tried 4-bit kv cache, you probably know this already, but it has a devastating impact especially on the long context agentic tasks that oMLX is primarily targeting. 4-bit is practically unusable in my opinion, and i'm skeptical whether 8-bit produces acceptable quality…

View full answer

jundot · 2026-03-11T04:56:17Z

jundot
Mar 11, 2026
Maintainer

As you can see in ml-explore/mlx-lm#941, the continuous batching that oMLX relies on as its core does not support kv cache quantization yet. (that PR is still open and doesn't appear to be proper quantization anyway.)

The first problem is that mlx-lm, oMLX's backend, doesn't support it. But even if i tried to implement it separately, i honestly have a lot of doubts about the effectiveness of kv cache quantization. If you've tried 4-bit kv cache, you probably know this already, but it has a devastating impact especially on the long context agentic tasks that oMLX is primarily targeting. 4-bit is practically unusable in my opinion, and i'm skeptical whether 8-bit produces acceptable quality either.

That's my take on it. Feel free to add any thoughts!

1 reply

Yif1999 Mar 11, 2026
Author

Thanks for clearing that up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does oMLX support 4-bit KV Cache quantization? #152

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does oMLX support 4-bit KV Cache quantization? #152

Uh oh!

Yif1999 Mar 11, 2026

Replies: 1 comment · 1 reply

Uh oh!

jundot Mar 11, 2026 Maintainer

Uh oh!

Yif1999 Mar 11, 2026 Author

Yif1999
Mar 11, 2026

Replies: 1 comment 1 reply

jundot
Mar 11, 2026
Maintainer

Yif1999 Mar 11, 2026
Author