Commit 6ea3866
Add fused butterfly passes to spqlios-intl FFT
Implement stage fusion in the interleaved FFT:
- Fused size-1+2+4 pass (3 butterfly stages in-register)
- Fused paired stages (Option C/D) for middle halfnn values
- Fused last butterfly + twist pass
Correctness: passes polymul and externalproduct tests.
Performance (tfhe-rs DEFAULT_PARAMETERS, N=512, k=3, uint64_t):
BR: 15.87ms (unfused was 16.50ms, split SPQLIOS 14.95ms, tfhe-rs 14.01ms)
The interleaved C++ with intrinsics is ~2x slower than split SPQLIOS's
hand-tuned assembly per FFT call. The format advantage (fewer load/store
streams) is negated by the compiler's code generation vs hand-written asm.
Closing the gap would require hand-written assembly for the interleaved
butterfly, equivalent to rewriting concrete-fft in x86 asm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 6314281 commit 6ea3866
1 file changed
+365
-389
lines changed
0 commit comments