Optimize horizontal_sample performance#2894
Merged
197g merged 16 commits intoimage-rs:mainfrom Apr 11, 2026
Merged
Conversation
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
197g
reviewed
Mar 29, 2026
Comment on lines
+273
to
+276
| let max_ks = (2.0 * src_support).ceil() as usize + 2; | ||
| // Preallocated buffer for precomputed weights. | ||
| let mut col_ws: Vec<f32> = | ||
| vec_try_with_capacity(col_count * max_ks).expect("capacity overflow in horizontal_sample"); |
Member
There was a problem hiding this comment.
Some kernels can be very large. This will definitely blow through those limits. There may be a way to write it as a fast-path but the case where this space-time tradeoff is inapplicable must be considered.
Contributor
Author
There was a problem hiding this comment.
After some thoughts, I think we can limit the maximum memory usage through batching process. When kernel size is very large, we can split the row into multiple batches to process. So that we can make sure the cache size is small enough to fit.
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Contributor
Author
|
Performance gains after removing |
197g
reviewed
Apr 4, 2026
Member
197g
left a comment
There was a problem hiding this comment.
The batch approach looks promising.
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Aurelia Molzer <5550310+197g@users.noreply.github.com>
197g
reviewed
Apr 5, 2026
Co-authored-by: Aurelia Molzer <5550310+197g@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
197g
approved these changes
Apr 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduction
This PR aims to optimize
horizontal_sample's performance:col_ws, so that we can avoidwsrecomputationTests
Test logs
Benchmark
Compare against main:
Bench logs