Optimize FFT logic for complex strided input to avoid oversized memory allocation by vlad-perevezentsev · Pull Request #2939 · IntelPython/dpnp

vlad-perevezentsev · 2026-06-01T10:55:12Z

This PR fixes an issue discovered while implementing #2927 where several FFT tests started failing after adding validation for stride configurations with oversized memory footprints.

The problem was that FFT could allocate output arrays using the same strided layout as a non-contiguous input. For inputs such as a[::2, this resulted in oversized allocations followed by an additional copy to a contiguous array.

This fix checks whether the memory footprint implied by the input strides exceeds the number of elements in the array. If so, the input is copied to a contiguous layout before configuring the FFT descriptor allowing oneMKL FFT to produce a contiguous output directly

By avoiding oversized allocations and the extra copy this significantly improves the performance of all FFT operations in dpnp with complex strided inputs

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
Have you added documentation for your changes, if necessary?
Have you added your changes to the changelog?

github-actions · 2026-06-01T11:52:34Z

View rendered docs @ https://intelpython.github.io/dpnp/pull/2939/index.html

coveralls · 2026-06-01T12:20:46Z

coverage: 78.255% (+0.007%) from 78.248% — fix_fft_logic into master

github-actions · 2026-06-04T12:42:21Z

Array API standard conformance tests for dpnp=0.21.0dev0=py313h509198e_60 ran successfully.
Passed: 1356
Failed: 4
Skipped: 16

antonwolfy · 2026-06-12T12:23:17Z

+            # If so, copy to contiguous to avoid oversized allocation
+            # for the output array and unnecessary copy to contiguous
+            # after oneMKL FFT
+            _strides = dpnp.get_usm_ndarray(a).strides


Why do we need to call get_usm_ndarray here?

antonwolfy · 2026-06-12T12:25:44Z

+            _shape = a.shape
+            # Max element displacement reachable by the strides.
+            # Negative strides are handled by _copy_array, so only
+            # positive strides are possible here


There is another regression also, which can be handled separately:

import dpnp as np a = np.arange(4, dtype='c8') b = np.broadcast_to(a, (3, 4)) np.fft.fft(b, axis=0) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[5], line 1 ----> 1 np.fft.fft(b, axis=0) File ~/code/dpnp/dpnp/fft/dpnp_iface_fft.py:122, in fft(a, n, axis, norm, out) 47 """ 48 Compute the one-dimensional discrete Fourier Transform. 49 (...) 118 119 """ 121 dpnp.check_supported_arrays_type(a) --> 122 return dpnp_fft( 123 a, forward=True, real=False, n=n, axis=axis, norm=norm, out=out 124 ) File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:664, in dpnp_fft(a, forward, real, n, axis, norm, out) 658 if c2r: 659 # input array should be Hermitian for c2r FFT 660 a = _make_array_hermitian( 661 a, axis, dpnp.are_same_logical_tensors(a, a_orig) 662 ) --> 664 return _fft( 665 a, 666 norm=norm, 667 out=out, 668 forward=forward, 669 # TODO: currently in-place is only implemented for c2c, see SAT-7154 670 in_place=in_place and c2c, 671 c2c=c2c, 672 axes=axis, 673 batch_fft=a_ndim != 1, 674 ) File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:447, in _fft(a, norm, out, forward, in_place, c2c, axes, batch_fft) 443 a_strides = _standardize_strides_to_nonzero(strides, a.shape) 444 dsc, out_strides = _commit_descriptor( 445 a, forward, in_place, c2c, a_strides, index, batch_fft 446 ) --> 447 res = _compute_result(dsc, a, out, forward, c2c, out_strides) 448 res = _scale_result(res, a.shape, norm, forward, index) 450 # Revert swapped axes File ~/code/dpnp/dpnp/fft/dpnp_utils_fft.py:239, in _compute_result(dsc, a, out, forward, c2c, out_strides) 231 result = dpnp_array( 232 out_shape, 233 dtype=out_dtype, (...) 236 sycl_queue=exec_q, 237 ) 238 res_usm = result.get_array() --> 239 ht_fft_event, fft_event = fi._fft_out_of_place( 240 dsc, a_usm, res_usm, forward, depends=dep_evs 241 ) 242 _manager.add_event_pair(ht_fft_event, fft_event) 244 if not isinstance(result, dpnp_array): ValueError: Memory addressed by the output array is not sufficiently ample.

antonwolfy · 2026-06-12T12:27:48Z

+            _shape = a.shape
+            # Max element displacement reachable by the strides.
+            # Negative strides are handled by _copy_array, so only
+            # positive strides are possible here


zero strides in case of broadcasting will also go that path, so the comment is not fully correct

antonwolfy · 2026-06-12T12:41:37Z

+            max_disp = sum(
+                st * (sh - 1) for st, sh in zip(_strides, _shape) if st > 0
+            )
+            if (max_disp + 1) > a.size:


It'd be helpful to add dedicated tests covering both copy path and no-copy path with transposed/F-contig complex input.

antonwolfy · 2026-06-12T12:48:37Z

-        if (
-            dpnp.is_cuda_backend(a) and not a.flags.c_contiguous
-        ):  # pragma: no cover
+        if dpnp.is_cuda_backend(a):  # pragma: no cover


Previously there was no copy for CUDA branch if batch_fft=False.
I guess it's indented change, based on the above comment that C-contig is a requirement for cuFFT without any exception. Then we need to update the PR description at least, mentioning that.

vlad-perevezentsev added 2 commits June 1, 2026 03:26

Optimize FFT for strided input to avoid oversized allocation

3682370

Merge master into fix_fft_logic

881cf69

vlad-perevezentsev added this to the 0.21.0 release milestone Jun 1, 2026

vlad-perevezentsev self-assigned this Jun 1, 2026

vlad-perevezentsev requested review from antonwolfy and ndgrigorian as code owners June 1, 2026 10:55

vlad-perevezentsev changed the title ~~Optimize FFT logic for strided input to avoid oversized memory allocation~~ Optimize FFT logic for complex strided input to avoid oversized memory allocation Jun 1, 2026

Update changelog

fbb13bd

vlad-perevezentsev mentioned this pull request Jun 1, 2026

Fix missing strides validation in dpnp.tensor.usm_ndarray #2927

Draft

7 tasks

ndgrigorian reviewed Jun 2, 2026

View reviewed changes

Comment thread dpnp/fft/dpnp_utils_fft.py

vlad-perevezentsev added 2 commits June 4, 2026 04:24

Update comment to clarify negative strides handling

7d1d7a2

Merge remote-tracking branch 'origin/master' into fix_fft_logic

12d3739

vlad-perevezentsev requested a review from ndgrigorian June 4, 2026 11:26

antonwolfy reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize FFT logic for complex strided input to avoid oversized memory allocation#2939

Optimize FFT logic for complex strided input to avoid oversized memory allocation#2939
vlad-perevezentsev wants to merge 5 commits into
masterfrom
fix_fft_logic

vlad-perevezentsev commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coveralls commented Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

antonwolfy Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vlad-perevezentsev commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coveralls commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Jun 1, 2026 •

edited

Loading