feat(ops): Add AETHER Geometric Sparse Attention Operator#1370
feat(ops): Add AETHER Geometric Sparse Attention Operator#1370teerthsharma wants to merge 2 commits intofacebookresearch:mainfrom
Conversation
AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) introduces
geometric block-sparse attention that achieves O(N_relevant) scaling by pruning
blocks based on mathematical upper bounds.
Mathematical Foundation:
- Uses Cauchy-Schwarz inequality to compute block interaction upper bounds
- Score_UB(Q_block, K_block) = max_{q}(q μ_K) + ||q|| r_K
- Blocks with upper_bound < threshold are safely skipped (exact, not approximate)
Implementation:
- Two Triton kernels: geometry computation + sparse attention
- _compute_block_geometry_kernel: Computes centroids and radii for Key blocks
- _geometric_sparse_attention_kernel: Online softmax with geometric pruning
- Full autograd support with backward pass
- Conditional Triton imports for cross-platform compatibility
Performance Targets:
- 4K seq uniform: ~1x (overhead from geometry)
- 32K seq clustered: ~4x speedup
- 128K seq typical: ~7x speedup
- 1M seq sparse: enables previously OOM workloads
Files:
- xformers/ops/aether_attention.py: Core operator (~500 LOC)
- xformers/ops/__init__.py: Export aether_attention, AetherAttention
- tests/test_aether_attention.py: Test suite
Signed-off-by: Teerth Sharma <teerthsharma@github.com>
- Add AETHER section to ops.rst with autodoc integration - Expand test coverage from 8 to 30+ test cases across 9 test classes - Add gradient correctness, determinism, edge case, and stress tests - Add block geometry verification tests - Match xFormers quality standards for operators
|
Hi @teerthsharma! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
SummaryThis PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves Key Features
ArchitectureAETHER treats the Key cache as a set of geometric clusters. We precompute the "Hyper-Bounding Box" (Centroid 1. The Inequality (The Gate)We use the Cauchy-Schwarz Inequality to derive a strict upper bound for the dot product between a Query If this upper bound is less than our threshold 2. System Flowgraph TD
subgraph HBM [Phase 1: Heavy Memory HBM]
K[Input Keys K]
V[Input Values V]
end
subgraph Precalc [Phase 2: Geometric Indexing]
K -->|Split into Blocks| B[Key Blocks]
B -->|Compute Centroid and Radius| Meta[Geometric Metadata]
end
subgraph Runtime [Phase 3: The Filter Per Query]
Q[Input Query q]
Meta -->|Load to SRAM| Gate{Is Bound under Threshold?}
Q --> Gate
Gate -- YES Prune --> Skip[SKIP BLOCK 0 FLOPs]
Gate -- NO Keep --> Comp[COMPUTE Standard Attn]
end
Skip --> Out[Output]
Comp --> Out
|
Description$O(N_{relevant})$ scaling by geometrically pruning key blocks that fall outside the active query manifold using Cauchy-Schwarz upper bounds on the dot product.
Summary This PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves
Key Features
Geometric Pruning: Prunes blocks where the upper bound interaction score is below a threshold.
Triton Implementation: High-performance fused kernels for block geometry computation and sparse attention.
Drop-in Replacement: Compatible with standard attention APIs.
Changes
Added
xformers/ops/aether_attention.py
: Core implementation and Triton kernels.
Added usage documentation to
docs/source/components/ops.rst
.
Added comprehensive test suite in
tests/test_aether_attention.py
(matching xFormers quality standards).