feat(ops): Add AETHER Geometric Sparse Attention Operator by teerthsharma · Pull Request #1370 · facebookresearch/xformers

teerthsharma · 2026-01-20T06:30:24Z

Description
Summary This PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves $O(N_{relevant})$ scaling by geometrically pruning key blocks that fall outside the active query manifold using Cauchy-Schwarz upper bounds on the dot product.

Key Features

Geometric Pruning: Prunes blocks where the upper bound interaction score is below a threshold.
Triton Implementation: High-performance fused kernels for block geometry computation and sparse attention.
Drop-in Replacement: Compatible with standard attention APIs.
Changes

Added
xformers/ops/aether_attention.py
: Core implementation and Triton kernels.
Added usage documentation to
docs/source/components/ops.rst
.
Added comprehensive test suite in
tests/test_aether_attention.py
(matching xFormers quality standards).

AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) introduces geometric block-sparse attention that achieves O(N_relevant) scaling by pruning blocks based on mathematical upper bounds. Mathematical Foundation: - Uses Cauchy-Schwarz inequality to compute block interaction upper bounds - Score_UB(Q_block, K_block) = max_{q}(q μ_K) + ||q|| r_K - Blocks with upper_bound < threshold are safely skipped (exact, not approximate) Implementation: - Two Triton kernels: geometry computation + sparse attention - _compute_block_geometry_kernel: Computes centroids and radii for Key blocks - _geometric_sparse_attention_kernel: Online softmax with geometric pruning - Full autograd support with backward pass - Conditional Triton imports for cross-platform compatibility Performance Targets: - 4K seq uniform: ~1x (overhead from geometry) - 32K seq clustered: ~4x speedup - 128K seq typical: ~7x speedup - 1M seq sparse: enables previously OOM workloads Files: - xformers/ops/aether_attention.py: Core operator (~500 LOC) - xformers/ops/__init__.py: Export aether_attention, AetherAttention - tests/test_aether_attention.py: Test suite Signed-off-by: Teerth Sharma <teerthsharma@github.com>

- Add AETHER section to ops.rst with autodoc integration - Expand test coverage from 8 to 30+ test cases across 9 test classes - Add gradient correctness, determinism, edge case, and stress tests - Add block geometry verification tests - Match xFormers quality standards for operators

meta-cla · 2026-01-20T06:30:32Z

Hi @teerthsharma!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-01-20T08:14:03Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

teerthsharma · 2026-01-20T08:21:28Z

Summary

This PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves $\mathcal{O}(N_{relevant})$ scaling by geometrically pruning key blocks that fall outside the active query manifold using Cauchy-Schwarz upper bounds.

Key Features

Geometric Pruning: Prunes blocks where the upper bound interaction score is below a threshold.
Triton Implementation: High-performance fused kernels for block geometry computation and sparse attention.
Drop-in Replacement: Compatible with standard attention APIs.
Zero-Overhead Backward: Gradients for pruned blocks are strictly zero, sparsifying the backward graph.

Architecture

AETHER treats the Key cache as a set of geometric clusters. We precompute the "Hyper-Bounding Box" (Centroid $\mu$ + Radius $r$) for every block of $K$ tokens.

1. The Inequality (The Gate)

We use the Cauchy-Schwarz Inequality to derive a strict upper bound for the dot product between a Query $q$ and any Key $k$ in a block:

$$\max(q \cdot k) \le q \cdot \mu + |q| \cdot r$$

If this upper bound is less than our threshold $\tau$, the entire block is skipped at the kernel level.

2. System Flow

graph TD
    subgraph HBM [Phase 1: Heavy Memory HBM]
        K[Input Keys K]
        V[Input Values V]
    end

    subgraph Precalc [Phase 2: Geometric Indexing]
        K -->|Split into Blocks| B[Key Blocks]
        B -->|Compute Centroid and Radius| Meta[Geometric Metadata]
    end

    subgraph Runtime [Phase 3: The Filter Per Query]
        Q[Input Query q]
        Meta -->|Load to SRAM| Gate{Is Bound under Threshold?}
        Q --> Gate
        
        Gate -- YES Prune --> Skip[SKIP BLOCK 0 FLOPs]
        Gate -- NO Keep --> Comp[COMPUTE Standard Attn]
    end

    Skip --> Out[Output]
    Comp --> Out

teerthsharma added 2 commits January 20, 2026 11:26

teerthsharma mentioned this pull request Jan 20, 2026

Geometric sparse attention model #1371

Closed

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 20, 2026

teerthsharma closed this Jan 22, 2026

teerthsharma reopened this Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ops): Add AETHER Geometric Sparse Attention Operator#1370

feat(ops): Add AETHER Geometric Sparse Attention Operator#1370
teerthsharma wants to merge 2 commits intofacebookresearch:mainfrom
teerthsharma:feat/aether-geometric-sparse-attention

teerthsharma commented Jan 20, 2026

Uh oh!

meta-cla Bot commented Jan 20, 2026

Uh oh!

meta-cla Bot commented Jan 20, 2026

Uh oh!

teerthsharma commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teerthsharma commented Jan 20, 2026

Uh oh!

meta-cla Bot commented Jan 20, 2026

Action Required

Process

Uh oh!

meta-cla Bot commented Jan 20, 2026

Uh oh!

teerthsharma commented Jan 20, 2026

Summary

Key Features

Architecture

1. The Inequality (The Gate)

2. System Flow

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant