Add a canonical long-context attention boundary benchmark (one command + metric)

A lot of long-context performance discussions stall because people benchmark different scripts, different seq lengths, and different metrics, so results are not comparable.

Request: add (or document) one canonical “long-context attention boundary” benchmark for xFormers that anyone can run:
- a single command/config (dtype, head dims, seq length(s), batch/concurrency, device),
- the metric to report (tokens/s or time/op + p99; memory bandwidth/bytes moved optional),
- and a short note that this boundary is stressing attention/KV-related memory traffic (not model quality).

If you point me to the best existing benchmark entrypoint and the metric you consider “the” acceptance currency, I can run it and return receipts (before/after on the same boundary) instead of opinions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a canonical long-context attention boundary benchmark (one command + metric) #1369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a canonical long-context attention boundary benchmark (one command + metric) #1369

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions