Skip to content

feat(ops): Add AETHER Geometric Sparse Attention Operator#1370

Open
teerthsharma wants to merge 2 commits intofacebookresearch:mainfrom
teerthsharma:feat/aether-geometric-sparse-attention
Open

feat(ops): Add AETHER Geometric Sparse Attention Operator#1370
teerthsharma wants to merge 2 commits intofacebookresearch:mainfrom
teerthsharma:feat/aether-geometric-sparse-attention

Conversation

@teerthsharma
Copy link
Copy Markdown

Description
Summary This PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves $O(N_{relevant})$ scaling by geometrically pruning key blocks that fall outside the active query manifold using Cauchy-Schwarz upper bounds on the dot product.

Key Features

Geometric Pruning: Prunes blocks where the upper bound interaction score is below a threshold.
Triton Implementation: High-performance fused kernels for block geometry computation and sparse attention.
Drop-in Replacement: Compatible with standard attention APIs.
Changes

Added
xformers/ops/aether_attention.py
: Core implementation and Triton kernels.
Added usage documentation to
docs/source/components/ops.rst
.
Added comprehensive test suite in
tests/test_aether_attention.py
(matching xFormers quality standards).

AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) introduces
geometric block-sparse attention that achieves O(N_relevant) scaling by pruning
blocks based on mathematical upper bounds.

Mathematical Foundation:
- Uses Cauchy-Schwarz inequality to compute block interaction upper bounds
- Score_UB(Q_block, K_block) = max_{q}(q  μ_K) + ||q||  r_K
- Blocks with upper_bound < threshold are safely skipped (exact, not approximate)

Implementation:
- Two Triton kernels: geometry computation + sparse attention
- _compute_block_geometry_kernel: Computes centroids and radii for Key blocks
- _geometric_sparse_attention_kernel: Online softmax with geometric pruning
- Full autograd support with backward pass
- Conditional Triton imports for cross-platform compatibility

Performance Targets:
- 4K seq uniform: ~1x (overhead from geometry)
- 32K seq clustered: ~4x speedup
- 128K seq typical: ~7x speedup
- 1M seq sparse: enables previously OOM workloads

Files:
- xformers/ops/aether_attention.py: Core operator (~500 LOC)
- xformers/ops/__init__.py: Export aether_attention, AetherAttention
- tests/test_aether_attention.py: Test suite

Signed-off-by: Teerth Sharma <teerthsharma@github.com>
- Add AETHER section to ops.rst with autodoc integration
- Expand test coverage from 8 to 30+ test cases across 9 test classes
- Add gradient correctness, determinism, edge case, and stress tests
- Add block geometry verification tests
- Match xFormers quality standards for operators
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented Jan 20, 2026

Hi @teerthsharma!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 20, 2026
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented Jan 20, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@teerthsharma
Copy link
Copy Markdown
Author

Summary

This PR introduces AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) Attention, a novel geometric sparse attention operator. AETHER achieves $\mathcal{O}(N_{relevant})$ scaling by geometrically pruning key blocks that fall outside the active query manifold using Cauchy-Schwarz upper bounds.

Key Features

  • Geometric Pruning: Prunes blocks where the upper bound interaction score is below a threshold.
  • Triton Implementation: High-performance fused kernels for block geometry computation and sparse attention.
  • Drop-in Replacement: Compatible with standard attention APIs.
  • Zero-Overhead Backward: Gradients for pruned blocks are strictly zero, sparsifying the backward graph.

Architecture

AETHER treats the Key cache as a set of geometric clusters. We precompute the "Hyper-Bounding Box" (Centroid $\mu$ + Radius $r$) for every block of $K$ tokens.

1. The Inequality (The Gate)

We use the Cauchy-Schwarz Inequality to derive a strict upper bound for the dot product between a Query $q$ and any Key $k$ in a block:

$$\max(q \cdot k) \le q \cdot \mu + |q| \cdot r$$

If this upper bound is less than our threshold $\tau$, the entire block is skipped at the kernel level.

2. System Flow

graph TD
    subgraph HBM [Phase 1: Heavy Memory HBM]
        K[Input Keys K]
        V[Input Values V]
    end

    subgraph Precalc [Phase 2: Geometric Indexing]
        K -->|Split into Blocks| B[Key Blocks]
        B -->|Compute Centroid and Radius| Meta[Geometric Metadata]
    end

    subgraph Runtime [Phase 3: The Filter Per Query]
        Q[Input Query q]
        Meta -->|Load to SRAM| Gate{Is Bound under Threshold?}
        Q --> Gate
        
        Gate -- YES Prune --> Skip[SKIP BLOCK 0 FLOPs]
        Gate -- NO Keep --> Comp[COMPUTE Standard Attn]
    end

    Skip --> Out[Output]
    Comp --> Out
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant