Skip to content

Commit 6ea3866

Browse files
Ubuntuclaude
andcommitted
Add fused butterfly passes to spqlios-intl FFT
Implement stage fusion in the interleaved FFT: - Fused size-1+2+4 pass (3 butterfly stages in-register) - Fused paired stages (Option C/D) for middle halfnn values - Fused last butterfly + twist pass Correctness: passes polymul and externalproduct tests. Performance (tfhe-rs DEFAULT_PARAMETERS, N=512, k=3, uint64_t): BR: 15.87ms (unfused was 16.50ms, split SPQLIOS 14.95ms, tfhe-rs 14.01ms) The interleaved C++ with intrinsics is ~2x slower than split SPQLIOS's hand-tuned assembly per FFT call. The format advantage (fewer load/store streams) is negated by the compiler's code generation vs hand-written asm. Closing the gap would require hand-written assembly for the interleaved butterfly, equivalent to rewriting concrete-fft in x86 asm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6314281 commit 6ea3866

File tree

1 file changed

+365
-389
lines changed

1 file changed

+365
-389
lines changed

0 commit comments

Comments
 (0)