perf(conservative): apply weights via scipy CSR matmul (robust without opt-einsum)#74
Draft
thodson-usgs wants to merge 1 commit into
Draft
Conversation
…r without opt_einsum) The axis-factored conservative method stored its weights as `sparse.COO` (xarray-contrib#49) and applied them with one multi-operand `xr.dot(data, W_lat, W_lon, optimize=True)`. That contraction only finds an efficient separable path when `opt_einsum` is installed; without it -- e.g. installing the `conservative-2d` extra (which brings `sparse`) but not `accel` (which brings `opt-einsum`) -- the sparse multi-operand einsum is catastrophic (13-29 s for a 48-step 0.5deg field) and the result comes back `sparse.COO`, a wasteful container for a dense field. Apply each per-axis weight with a scipy CSR sparse-dense matmul instead (`_csr_apply_axis`, via `apply_ufunc` so dask / output_chunks / skipna are preserved). One unified path: the weight is compressed to CSR whether stored as `sparse.COO` or dense. Separability makes the sequential per-axis contraction identical to the fused one. This is fast and robust whether or not opt_einsum is present, keeps the weights compressed (the (n_src, n_dst) matrix is never densified -- it reaches hundreds of MB to GB at high resolution), and yields a dense result. scipy is already a base dependency, so no new requirement. Speedups without opt_einsum: ~56x at 1deg, ~60-80x at 0.5deg. Numerical results unchanged (sequential vs fused contraction differs only at the ~1e-15 fp level). Tests (tests/test_regrid.py): - test_conservative_conserves_known_integral: gold-standard known-integral conservation -- cos^2(lat)*(1.5+sin(lon)) -> 4*pi, conserved to ~4e-7 measured with independent analytic spherical cell areas. - test_conservative_returns_dense_output: regression guard that the result is a dense ndarray, not sparse.COO. mypy: add a scipy.* override (scipy ships no stubs) and annotate the new returns; error count unchanged from baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The axis-factored
conservativeregridder applied its (sparse) per-axis weights with a single multi-operandxr.dot(data, W_lat, W_lon, optimize=True). That contraction only finds an efficient separable path whenopt_einsumis installed. Installing theconservative-2dextra (which bringssparse) withoutaccel(which bringsopt-einsum) hits the slow path: the sparse multi-operand einsum is 20–80× slower (13–29 s for a 48-step 0.5° field) and returns asparse.COOresult — a wasteful container for a dense field.This applies each per-axis weight with a scipy CSR sparse-dense matmul instead (
_csr_apply_axis, viaapply_ufuncso dask /output_chunks/skipnaare preserved). One unified path — the weight is compressed to CSR whether stored assparse.COOor dense. Separability makes the sequential per-axis contraction identical to the fused one. It is fast with or withoutopt_einsum, keeps the weights compressed (the(n_src, n_dst)matrix is never densified — it reaches hundreds of MB–GB at high resolution), and yields a dense result.scipyis already a base dependency, so no new requirement.Benchmark (no
opt_einsum, eager, ntime=48)xr.dot)With
opt_einsumpresent the old path was already fast; CSR matches it. Numerical results are unchanged (sequential vs fused contraction differs only at ~1e-15).Tests
test_conservative_conserves_known_integral— gold-standard known-integral conservation (cos²(lat)·(1.5+sin(lon)) → 4π, conserved to ~4e-7 measured with independent analytic spherical cell areas).test_conservative_returns_dense_output— regression guard that the result is a dense ndarray.sparse-installed and no-sparsecode paths.Note
This effectively supersedes #49's sparse weights for the axis-factored method's apply step — the weights are still stored compressed; they are just multiplied via CSR instead of a sparse einsum.
utils.overlapstill builds the weight dense transiently during the build; making that sparse end-to-end is a separate, deeper optimization left for later.🤖 Generated with Claude Code