Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70
Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70thodson-usgs wants to merge 7 commits into
Conversation
|
Hi Timothy, Thanks for contributing. Nice to see that it was possible to add regridding for non-rectilinear grids. I had thought before of using shapely and intersecting polygons, however I could not get it to perform well enough (didn't know about I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow. The LLM also thinks that polygons crossing the antimeridian is a "fringe case", I would disagree 😅 |
|
Honestly, I haven't done much yet :), but I thought a draft PR would be a good way to get the ball rolling. The bot also identified a few other small bugfixes and upstream optimizations that I'm working through. As those progress, I'm hoping we can pair on this PR, though I want to be respectful of your time. Cloud friendly regridding has long been on our wishlist at USGS (and other groups), so we were excited when Even if it's never merged, I think this will be a useful datapoint on a longstanding problem. |
Ah, this might be in part because I fed it a Julia implementation for reference (which should be acknowledged). I'll see if the LLM can clean things up, before I make a pass. |
Yeah it's a good starting point and at a basic level it seems to perform well. So as a source of inspiration it seems promising. |
|
Nice, a more generalized conservative regridding approach that bypasses ESMF would be a great addition. Since xESMF is the current way to achieve this in python, adding some direct comparisons there would be nice as we've done with some of the other methods. Broadly agree with @BSchilperoort though. Since this package is very lean and focused as is (only ~2600 lines in src), we should be cautious about dumping in a bunch of code without consideration for maintainability. |
|
Thanks @slevang, |
|
I don't think it is out of scope, and certainly don't mean to discourage you! Only that we should make sure some humans understand the code and agree that it is clean and maintainable before merging. The current focus of this package is basically "tricks for rectilinear grids" since those can be highly optimized. But I don't see why we can't expand to cover optimizations for other regridding problems, as long as it all stays semi-coherent. For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here. |
613cdcb to
3ed200b
Compare
Are you willing to share that code? Even in a rough form? I propose that we prompt a bot to integrate your optimizations into a new branch. Then write a prompt to benchmark the performance against xESMF. Then a final prompt to profile the code to explain any performance gaps. This much should all entail minimal human effort. If the results are interesting, we can take it further. If you prefer not to share the code, I still encourage you to try it. |
Add ConservativeRegridder and the .regrid.conservative_2d accessor for grids the axis-factored .conservative path can't express: curvilinear (2D lat/lon coords), unstructured meshes (ConservativeRegridder.from_polygons), and arbitrary grid-to-polygon aggregation. Cell intersections go through shapely 2 with sparse-COO weight storage, a rectilinear analytic fast path, threaded GEOS for curvilinear grids, an analytic cylindrical equal-area manifold="cea" option for spherical areas on lat/lon grids, antimeridian handling, dtype preservation, and netCDF weight-matrix save/load. Installable via the optional conservative-2d extra. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add curvilinear, unstructured, and grid->regions demo notebooks with executed outputs (the docs build runs with nb_execution_mode="off", so outputs must be committed to render, matching the existing demos), wire them into the notebooks toctree, and list conservative_2d on the index and quickstart pages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dfac234 to
fad4a5b
Compare
The apply matrix was stored as sparse.COO, but sparse's `ndarray @ COO` path uses a numba COO kernel that never converts to CSR (unlike its own `COO @ COO` and `GCXS @ ndarray` paths), making it 2-18x slower than scipy's C SpMM. Build the apply matrix once as a scipy CSR (n_dst, n_src) and evaluate `(W @ flat.T).T`. scipy is already a hard dependency, so this also covers the dense fallback. Measured end-to-end (720x360 -> 360x180 rectilinear regrid): T=1: 3.2ms -> 0.40ms (8x) T=50: 32.1ms -> 11.1ms (2.9x) Outputs are unchanged: test_conservative_2d still matches .conservative to 1e-12 and conserves mass to 1e-12. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Share the regrid-weight dtype policy `np.result_type(np.float32, ...)` via `utils.min_weight_dtype`, used by both `conservative.format_weights` and `conservative_2d._result_dtype` (one source of truth for the precision floor). - Extract the thrice-repeated rectilinear-coord-pair test into `_is_rectilinear_pair`, so "rectilinear" has one definition. - Drop a redundant dtype-equality guard before `astype(copy=False)` in `_apply_core` (the cast already no-ops when dtypes match). Pure refactors; no behavior change (106 passed, ruff + mypy unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ative_2d
- README and the docs index still described xarray-regrid as regridding only
"between rectilinear grids" — broaden to cover conservative_2d's curvilinear,
unstructured, and grid-to-polygon support; add conservative_2d to the method
list and document the `conservative-2d` install extra. Fix two typos in the
index overview ("possibly"/"effiently").
- to_netcdf's docstring said it saves "the weight matrix"; it saves the
unnormalized area-intersection matrix (self.areas). Make it precise, matching
the class docstring's deliberate areas-vs-weights distinction.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_Grid.bounds is read only by the rectilinear analytic fast-path; curvilinear and from_polygons grids run the GEOS/STRtree path over the polygons and never touch bounds. Stop computing shapely.bounds for those grids (now None), removing a wasted full-array pass at build time that scales with cell count (e.g. two shapely.bounds sweeps over the source mesh in from_polygons). No behavior change (106 passed; ruff + mypy unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacks the s2geometry backend onto the rebuilt conservative_2d (PR xarray-contrib#70). manifold="s2" intersects cells as great-circle polygons on the sphere using the optional `spherely` package, plugging into the existing _GRID_BUILDERS registry: an s2 grid carries spherely Geography cell polygons (s2_polys) alongside planar shapely boxes (used only as the STRtree candidate-pair bbox filter, since spherely has no spatial index), and _build_intersection_areas dispatches to spherely.intersection / spherely.area for s2-vs-s2 pairs (areas in steradians; the Earth radius cancels through row normalization). Adds the `spherical` install extra (spherely>=0.1.1), s2 tests (gated on spherely via importorskip), and a spherical demo notebook wired into the docs toctree. Reconciled from the old spherely-integration lineage onto the new xarray-contrib#70 architecture: dropped the superseded conservative_polygon.py module — its RegridderMetadata is replaced by RegridSpec / _conservative_2d_serialization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion test The existing test_cea_conserves_integral asserts cea beats planar by ~10x on a lon-constant field (a relative accuracy check). Add a strict gold-standard conservation test: cos^2(lat) * (1.5 + sin(lon)) (known spherical integral 4*pi), regridded with manifold="cea" onto a co-extensive global grid, conserved to machine precision measured with INDEPENDENT analytic spherical cell areas (sin-latitude bands x dlon) — not the regridder's own matrix — with the source integral approaching the known value. A varying field + independent areas + matched domains is what makes it non-tautological (a self-area check is a row-normalization identity; a constant field conserves regardless). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds conservative regridding for grids that aren't 1D-separable — a capability the existing
.conservativemethod cannot express:lat[i, j]/lon[i, j]coordinate variables (ORCA, tripolar, rotated regional).ConservativeRegridder.from_polygons(...)accepts arbitrary shapely polygon arrays (MPAS, ICON, finite-element).from_polygonspath. Auxiliary target coordinates (region_id, labels, scalar metadata) ride along with the output.manifold="cea"projects cell edges through an analytic Lambert cylindrical equal-area map before intersecting, matching the axis-factored path's sin-weighted accuracy at fast-path cost.manifold="planar"(the default) intersects in the raw coordinate space.from_polygons(..., periodic=True)unwraps rings that cross ±180°.Internals:
ConservativeRegridderclass — reusable regridder; builds the sparse area-intersection matrix once in__init__, applies to many fields..T/.transpose()returns the backward regridder, reusing any already-materialized weight matrices.manifoldselects the cell-builder ("planar","cea"); new backends (e.g. true great-circle"s2") plug in with a single registry insert..to_netcdf/.from_netcdf— persist the weight matrix plus reproducibility metadata (dim names, shapes, grid ranges, manifold, xarray-regrid version, timestamp, schema version) in root attributes, with source / target coord-only groups alongside. Schema-versioned for forward compat..regrid.conservative_2d(...)on xarray DataArrays / Datasets.pip install xarray-regrid[conservative-2d]pullsshapely>=2.0,sparse, andh5netcdf. Existing users unaffected.Motivation
.conservativeuses Stephan Hoyer's axis-factored 1D overlap — fast and elegant but strictly rectilinear (1D-separable). Users with curvilinear ocean models, unstructured climate meshes, or grid-to-region aggregation currently reach for xESMF, which:dask.distributed.conservative_2dfills the gap with pure shapely + sparse. Works on Windows, composes with distributed dask, installs with a singlepip install shapely.Naming follows xESMF's short-method-name convention. "
_2d" distinguishes full 2D cell-polygon intersection from the 1D axis-factored approach and doesn't prejudge whether the grid is curvilinear or unstructured.Design notes
intersectspredicate) and relies on the subsequentarea > 0filter to drop bbox false-positives — much cheaper for tight quad cells.from_polygonsdefaults topredicate_filter=True(GEOS predicate on), since arbitrary user polygons often have loose bounding boxes; passpredicate_filter=Falsefor tight-bbox inputs.shapely.intersectionover candidate pairs is parallelized across aThreadPoolExecutor(above ~1k pairs). ~3–4× on 8 cores.sparse.COO(dense fallback ifsparseis absent); apply goes throughsparse.matmulinsidexr.apply_ufunc(dask="parallelized"). A cached, pre-sorted/transposed apply matrix avoids a per-call re-sort..conservative: two-pass value + mask matmul,nan_thresholdwith the same interpretation (reusesget_valid_threshold).skipna, so uncovered target cells (domain boundaries, polygon holes) always produce NaN._apply_core.Test plan
conservative_2dexactly reproduces.conservative(planar) to 1e-12.from_polygonsmass conservation to 1e-12 rel..Ttranspose on rectilinear and curvilinear; constant field round-trip exact on aligned grids.manifold="cea": matches the factored sin-weighted path on lat/lon grids to within grid quadrature, and conserves the spherical integral ≥10× better than planar.[0,360]↔[-180,180]alignment (both directions);from_polygons(periodic=True)routes samples across the seam.from_polygonsoutput.manifold/ out-of-rangenan_threshold/ non-1D polygon arrays rejected.test_conservative_2d.pymodule passing on this branch.Docs
Three demo notebooks under
docs/notebooks/demos/, committed with executed outputs (the docs build runsnb_execution_mode="off", so outputs render only if committed — matching the existing demos), and wired into the notebooks toctree plus the index / quickstart method lists:demo_conservative_2d_curvilinear.ipynb— rotated 2D target; conservation check via the publicsource_coverage_areas/target_areasdiagnostics.demo_conservative_2d_regions.ipynb— air-temperature tutorial dataset aggregated onto US states (dissolvedgeodatasets.ncovr); map + ranked bar chart, conservation check, save/reload, seasonal reuse.demo_conservative_2d_unstructured.ipynb— Voronoi mesh viafrom_polygons+.to_netcdf/.from_netcdf.CHANGELOG updated under
## Unreleased.Follow-ups (out of scope for this PR)
manifold="s2"backed by the optionalspherelydependency.from_polygons, but wants a dedicated API surface for loading region polygons and aDataArray-aware flatten helper so callers don't hand-ravel()to match polygon order.🤖 Generated with Claude Code