Question: Why is there is a big difference in Token throughput for Deepseek R1?

Thanks for providing an open source implementation of Deepseek R1. It is simple and easy to understand.

The benchmark results you have shared for TPU-v5e-64 logs throughput at 71 tokens/sec.

https://github.com/jax-ml/jax-llm-examples/tree/main/deepseek_r1_jax#inference-performance-results

The LMSys + SGLang implementation of Deepseek R1 places the token throughput per GPU at 5600 tokens/sec/GPU 

1. Why is there a significant difference in throughput? (Just first order approximation is enough)
2. Is your throughput number tokens/sec or tokens/sec/TPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Why is there is a big difference in Token throughput for Deepseek R1? #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Why is there is a big difference in Token throughput for Deepseek R1? #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions