Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 22.2k

Code
Issues 942
Pull requests 110
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: Dao-AILab/flash-attention

Labels 9 Milestones 0

New pull request New

110 Open 438 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[WIP] support the flash api for Ascend

#2246 opened Feb 10, 2026 by AnyFree813

Loading…

[AMD] Adding gfx1150/51 for AMD RDNA arch

#2245 opened Feb 10, 2026 by saeid-rostami • Draft

Add gfx1150/gfx1151 (RDNA 3.5) to RDNA_ARCHS

#2243 opened Feb 9, 2026 by rwfsmith

Loading…

[Cute,Flex,Fwd] Allow vectorized score_mod definitions

#2236 opened Feb 5, 2026 by reubenconducts

Loading…

[AMD] Migrate to Triton Backend to Aiter

#2230 opened Feb 4, 2026 by micmelesse

Loading…

Nicer headdim error message

#2227 opened Feb 4, 2026 by drisspg

Loading…

[WIP] varlen blocksparsity

#2224 opened Feb 2, 2026 by reubenconducts • Draft

[Draft][Cute,Fwd,Sm120] FA Cute DSL sm12x

#2222 opened Feb 2, 2026 by johnnynunez • Draft

[Ai-assisted] CLC work stealing

#2218 opened Jan 31, 2026 by drisspg

Loading…

[ROCM] Add support with Infinity Cache (LLC) awareness for performance improvement - [PR#2147 rebased on PR#2178]

#2217 opened Jan 29, 2026 by tianwyan

Loading…

Add shift scheduler for deterministic full‑mask FA3 bwd on Hopper (sm90)

#2207 opened Jan 23, 2026 by tie-pilot-qxw

Loading…

Add loc info & Fix api changes for CuTeDSL 4.4

#2204 opened Jan 23, 2026 by keithzzzzz

Loading…

BWD sm100 2cta

#2202 opened Jan 23, 2026 by tzadouri

Loading…

[Cute, SM100] Fix comment in tmem_p_offset

#2201 opened Jan 22, 2026 by Edenzzzz

Loading…

Warn when ninja is missing

#2191 opened Jan 17, 2026 by blueberrycongee

Loading…

Fix compute_block_sparsity import in benchmark_mask_mod

#2190 opened Jan 17, 2026 by blueberrycongee

Loading…

[Cute][Testing] Protyping a fast test mode for Cute

#2188 opened Jan 16, 2026 by drisspg

Loading…

[Cute,Fwd,Sm100] support irregular qhead / kvhead ratios

#2186 opened Jan 16, 2026 by timmy-feng • Draft

[Cute] Add torch.compile support for FA4

#2164 opened Jan 9, 2026 by gilfordting

Loading…

Update mha_fwd.cpp, Normalize the commented-out parameters

#2160 opened Jan 9, 2026 by breakfei

Loading…

Update schema in test_flash3_bw_compatibility

#2153 opened Jan 8, 2026 by guilhermeleobas • Draft

Add FLASH_ATTENTION_FORCE_NON_STABLE_API option to allow building on NVidia Pytorch 25.09 image

#2140 opened Jan 5, 2026 by jp-gr

Loading…

[ROCM] Fix AMD Triton backend crash when dropout != 0 and return_attn_probs = False

#2111 opened Dec 30, 2025 by Logiquo

Loading…

[Cute,Fwd,Sm100] fp8 e4m3 and e5m2 support

#2109 opened Dec 29, 2025 by dcw02

Loading…

refactor llama test

#2107 opened Dec 29, 2025 by m3ngyang

Loading…

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!