You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
my Mac : M5MAX 128GB(18/40)
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.5-2B-MLX-8bit
Single Request Results
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 94.6 4.88 10826.2 tok/s 206.5 tok/s 0.714 1612.6 tok/s 3.44 GB
pp4096/tg128 354.1 5.09 11567.6 tok/s 198.0 tok/s 1.001 4221.6 tok/s 3.97 GB
pp8192/tg128 700.7 5.35 11692.0 tok/s 188.3 tok/s 1.380 6027.8 tok/s 4.28 GB
pp16384/tg128 1480.6 6.08 11066.1 tok/s 165.7 tok/s 2.253 7328.9 tok/s 4.59 GB
pp32768/tg128 3372.7 7.38 9715.6 tok/s 136.5 tok/s 4.310 7631.9 tok/s 5.11 GB
pp65536/tg128 8748.7 10.06 7491.0 tok/s 100.1 tok/s 10.027 6548.9 tok/s 6.60 GB
pp131072/tg128 28744.6 17.75 4559.9 tok/s 56.8 tok/s 30.999 4232.4 tok/s 9.51 GB
pp200000/tg128 59790.8 20.93 3345.0 tok/s 48.1 tok/s 62.449 3204.6 tok/s 12.55 GB
Continuous Batching — Same Prompt
pp1024 / tg128 · partial prefix cache hit
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 206.5 tok/s 1.00x 10826.2 tok/s 10826.2 tok/s 94.6 0.714
2x 385.9 tok/s 1.87x 9994.7 tok/s 4997.4 tok/s 204.8 0.868
4x 597.4 tok/s 2.89x 10882.8 tok/s 2720.7 tok/s 376.3 1.233
8x 688.8 tok/s 3.34x 10173.4 tok/s 1271.7 tok/s 805.1 2.292
Continuous Batching — Different Prompts
pp1024 / tg128 · no cache reuse
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 206.5 tok/s 1.00x 10826.2 tok/s 10826.2 tok/s 94.6 0.714
2x 385.9 tok/s 1.87x 10742.4 tok/s 5371.2 tok/s 190.6 0.854
4x 576.8 tok/s 2.79x 10442.8 tok/s 2610.7 tok/s 392.1 1.280
8x 697.4 tok/s 3.38x 10278.7 tok/s 1284.8 tok/s 796.9 2.265
Beta Was this translation helpful? Give feedback.
All reactions