A lot of long-context performance discussions stall because people benchmark different scripts, different seq lengths, and different metrics, so results are not comparable.
Request: add (or document) one canonical “long-context attention boundary” benchmark for xFormers that anyone can run:
- a single command/config (dtype, head dims, seq length(s), batch/concurrency, device),
- the metric to report (tokens/s or time/op + p99; memory bandwidth/bytes moved optional),
- and a short note that this boundary is stressing attention/KV-related memory traffic (not model quality).
If you point me to the best existing benchmark entrypoint and the metric you consider “the” acceptance currency, I can run it and return receipts (before/after on the same boundary) instead of opinions.
A lot of long-context performance discussions stall because people benchmark different scripts, different seq lengths, and different metrics, so results are not comparable.
Request: add (or document) one canonical “long-context attention boundary” benchmark for xFormers that anyone can run:
If you point me to the best existing benchmark entrypoint and the metric you consider “the” acceptance currency, I can run it and return receipts (before/after on the same boundary) instead of opinions.