Skip to content

Add a canonical long-context attention boundary benchmark (one command + metric) #1369

@StanByriukov02

Description

@StanByriukov02

A lot of long-context performance discussions stall because people benchmark different scripts, different seq lengths, and different metrics, so results are not comparable.

Request: add (or document) one canonical “long-context attention boundary” benchmark for xFormers that anyone can run:

  • a single command/config (dtype, head dims, seq length(s), batch/concurrency, device),
  • the metric to report (tokens/s or time/op + p99; memory bandwidth/bytes moved optional),
  • and a short note that this boundary is stressing attention/KV-related memory traffic (not model quality).

If you point me to the best existing benchmark entrypoint and the metric you consider “the” acceptance currency, I can run it and return receipts (before/after on the same boundary) instead of opinions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions