Comparative Speed of some selected Backend Attention Optimizers for Image / Video Generation #12879

ukaprch · 2025-12-22T20:02:01Z

ukaprch
Dec 22, 2025

Just for s**ts & giggles I ran some simple tests using Flux 1.D and various selected diffuser backend attn processors to compare speeds. The results aren't too surprising given what we already know.
Environment: Windows 10 OS
Python 12.5.0
Latest diffuser & transformer libraries
Pytorch v2.8 cuda based on Nvidia Toolkit v12.8
GPU: RTX 4090
64 GB of RAM
Quantized Transformer (INT8) and T5Encoder (INT8).

I used the same basic parameters: Prompt, CFG, no True CFG, steps: 30, FlowMatchEulerDiscreteScheduler, 1024x1024 image size.
When using the selected backend attn processors I compiled the transformers in each case. This process adds time for the initial image generation. I ran each test separately so that compilation (other factors) were not an issue.

Here are my results in descending order from worst to best:

compile transformer (_native_cudnn)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:26<00:00, 2.89s/it]

compile transformer (flash_varlen)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:20<00:00, 2.69s/it]

compile transformer (xformers)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:05<00:00, 2.18s/it]

compile transformer (sage_varlen)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:00<00:00, 2.03s/it]

compile transformer (flash)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:50<00:00, 1.69s/it]

compile transformer (sage)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:38<00:00, 1.27s/it]

compile transformer (_sage_qk_int8_pv_fp8_cuda)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:34<00:00, 1.14s/it]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparative Speed of some selected Backend Attention Optimizers for Image / Video Generation #12879

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Comparative Speed of some selected Backend Attention Optimizers for Image / Video Generation #12879

Uh oh!

ukaprch Dec 22, 2025

Replies: 0 comments

ukaprch
Dec 22, 2025