Conversation
57% performance improvement — benchmark went from 237 ms → 102 ms. Two optimizations applied to correlation_models.rs: rval_from_distances: Replaced two-pass computation (separate a and b arrays with inner mapv().product() allocations) with a single-pass scalar loop — no intermediate array allocations. _jac_helper → _jac_from_r: Replaced the O(n·d²·h²) nested "product-excluding-one-factor" loop + einsum with a closed-form O(n·d·h) formula. Since the Matern 5/2 polynomial is always positive, the excluded-product can be computed via division: total_product / single_factor.
The changes for SE and AE show marginal improvement (criterion reports "change within noise threshold" and "no change in performance detected" for AE). This is expected — these kernels were already simpler than the Matern ones. The main optimization (powf(2.) → v * v and shared theta_w computation) eliminates unnecessary allocations and expensive transcendental calls, but since the exponential kernels lack the O(n·d²·h²) product-excluding-one-factor structure that made Matern so costly, the absolute gains are modest. Summary of optimizations applied: SquaredExponential: Replaced all powf(F::cast(2.)) with v * v (avoids log+exp internally); shared neg_theta_w_sq computation in jac and rval_with_jac instead of recomputing theta_w² + separate negation AbsoluteExponential: Shared neg_theta_w in rval_with_jac, computing r and jr from the same intermediate; avoided redundant rval_from_distances call in rval_with_jac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR achieves a x2 speedup when Egor is using Matern32 or Matern52 kernels