Layerwise benchmarks #9890

youki-sada · 2025-12-11T04:47:17Z

youki-sada
Dec 11, 2025

There is three benchmark options to get layerwise latency. However these are not available in latest version of TensorRT-LLM. Do you have any idea to get the profile for arbitrary models in latest TensorRT-LLM?

benchmark.py can generate layerwise latency by --dump_profile option, but this script is obsoleted since version v0.20.0.
layer wise benchmarks supports only DeepSeek and Qwen-Next of HF models but not quantized trt models.
As for nsys profiling, it doesn't provide accurate latency. There is too many cudaEventSynchronize in nsys-rep and the kernel execution time is too short compared to the actual latency..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Layerwise benchmarks #9890

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Layerwise benchmarks #9890

Uh oh!

youki-sada Dec 11, 2025

Replies: 0 comments

youki-sada
Dec 11, 2025