You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just for s**ts & giggles I ran some simple tests using Flux 1.D and various selected diffuser backend attn processors to compare speeds. The results aren't too surprising given what we already know.
Environment: Windows 10 OS
Python 12.5.0
Latest diffuser & transformer libraries
Pytorch v2.8 cuda based on Nvidia Toolkit v12.8
GPU: RTX 4090
64 GB of RAM
Quantized Transformer (INT8) and T5Encoder (INT8).
I used the same basic parameters: Prompt, CFG, no True CFG, steps: 30, FlowMatchEulerDiscreteScheduler, 1024x1024 image size.
When using the selected backend attn processors I compiled the transformers in each case. This process adds time for the initial image generation. I ran each test separately so that compilation (other factors) were not an issue.
Here are my results in descending order from worst to best:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Just for s**ts & giggles I ran some simple tests using Flux 1.D and various selected diffuser backend attn processors to compare speeds. The results aren't too surprising given what we already know.
Environment: Windows 10 OS
Python 12.5.0
Latest diffuser & transformer libraries
Pytorch v2.8 cuda based on Nvidia Toolkit v12.8
GPU: RTX 4090
64 GB of RAM
Quantized Transformer (INT8) and T5Encoder (INT8).
I used the same basic parameters: Prompt, CFG, no True CFG, steps: 30, FlowMatchEulerDiscreteScheduler, 1024x1024 image size.
When using the selected backend attn processors I compiled the transformers in each case. This process adds time for the initial image generation. I ran each test separately so that compilation (other factors) were not an issue.
Here are my results in descending order from worst to best:
compile transformer (_native_cudnn)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:26<00:00, 2.89s/it]
compile transformer (flash_varlen)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:20<00:00, 2.69s/it]
compile transformer (xformers)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:05<00:00, 2.18s/it]
compile transformer (sage_varlen)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:00<00:00, 2.03s/it]
compile transformer (flash)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:50<00:00, 1.69s/it]
compile transformer (sage)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:38<00:00, 1.27s/it]
compile transformer (_sage_qk_int8_pv_fp8_cuda)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:34<00:00, 1.14s/it]
Beta Was this translation helpful? Give feedback.
All reactions