Layerwise benchmarks #9890
Unanswered
youki-sada
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There is three benchmark options to get layerwise latency. However these are not available in latest version of TensorRT-LLM. Do you have any idea to get the profile for arbitrary models in latest TensorRT-LLM?
benchmark.pycan generate layerwise latency by--dump_profileoption, but this script is obsoleted since version v0.20.0.cudaEventSynchronizeinnsys-repand the kernel execution time is too short compared to the actual latency..Beta Was this translation helpful? Give feedback.
All reactions