Adds OpenMP to qsort, should also improve test speed a bit#179
Merged
r-devulap merged 7 commits intonumpy:mainfrom Mar 28, 2025
Merged
Adds OpenMP to qsort, should also improve test speed a bit#179r-devulap merged 7 commits intonumpy:mainfrom
r-devulap merged 7 commits intonumpy:mainfrom
Conversation
9500eb4 to
8ddc73a
Compare
Member
I assume this was benchmarked on a SKX? |
r-devulap
previously approved these changes
Mar 25, 2025
Member
r-devulap
left a comment
There was a problem hiding this comment.
LGTM. I noticed you use the same thresholds mostly from the key-value sort. Did you get a chance to experiment with it?
Contributor
Author
|
Oops, that int16 slowness seems to be a real issue. Let me fix that quickly |
8ddc73a to
dba705f
Compare
Contributor
Author
|
Okay, this should be fixed now. Here are some benchmarks from my SPR machine: SPR Benchmarks |
r-devulap
approved these changes
Mar 28, 2025
r-devulap
pushed a commit
to r-devulap/numpy
that referenced
this pull request
Apr 1, 2025
Pulls in 2 major changes: (1) Fixes a performance regression on 16-bit dtype sorting (see numpy/x86-simd-sort#190) (2) Adds openmp support for quicksort which speeds up sorting arrays > 100,000 by up to 3x. See: numpy/x86-simd-sort#179
r-devulap
pushed a commit
to r-devulap/numpy
that referenced
this pull request
Apr 15, 2025
Pulls in 2 major changes: (1) Fixes a performance regression on 16-bit dtype sorting (see numpy/x86-simd-sort#190) (2) Adds openmp support for quicksort which speeds up sorting arrays > 100,000 by up to 3x. See: numpy/x86-simd-sort#179
r-devulap
pushed a commit
to r-devulap/numpy
that referenced
this pull request
May 1, 2025
Pulls in 2 major changes: (1) Fixes a performance regression on 16-bit dtype sorting (see numpy/x86-simd-sort#190) (2) Adds openmp support for quicksort which speeds up sorting arrays > 100,000 by up to 3x. See: numpy/x86-simd-sort#179
MaanasArora
pushed a commit
to MaanasArora/numpy
that referenced
this pull request
Jul 10, 2025
Pulls in 2 major changes: (1) Fixes a performance regression on 16-bit dtype sorting (see numpy/x86-simd-sort#190) (2) Adds openmp support for quicksort which speeds up sorting arrays > 100,000 by up to 3x. See: numpy/x86-simd-sort#179
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds OpenMP acceleration to quicksort, in addition to making some changes to the testing code. On my current testing system, this gives up to a 3x speedup, though I think more may be achieved on a stronger system.
Benchmarks:
10m
1m
And to show that there does not seem to be a regression with small sizes due to the SIMD logic:
128
In addition, this adds larger tests for quicksort to test the OpenMP logic. Out of concern for the runtime of the test suite, I modified both these new tests and the older kv tests to only use the larger sizes for the main sort, the only one that uses OpenMP and thus needs them. I believe this should result in a noticeable net reduction in the runtime of the test suite.