Skip to content

Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70

Draft
thodson-usgs wants to merge 7 commits into
xarray-contrib:mainfrom
thodson-usgs:conservative
Draft

Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70
thodson-usgs wants to merge 7 commits into
xarray-contrib:mainfrom
thodson-usgs:conservative

Conversation

@thodson-usgs
Copy link
Copy Markdown

@thodson-usgs thodson-usgs commented Apr 20, 2026

Summary

Adds conservative regridding for grids that aren't 1D-separable — a capability the existing .conservative method cannot express:

  • Curvilinear grids — accepts 2D lat[i, j] / lon[i, j] coordinate variables (ORCA, tripolar, rotated regional).
  • Unstructured meshesConservativeRegridder.from_polygons(...) accepts arbitrary shapely polygon arrays (MPAS, ICON, finite-element).
  • Grid → polygon aggregation — regrid structured data onto country shapes or watershed boundaries, same from_polygons path. Auxiliary target coordinates (region_id, labels, scalar metadata) ride along with the output.
  • Spherical areas on lat/lonmanifold="cea" projects cell edges through an analytic Lambert cylindrical equal-area map before intersecting, matching the axis-factored path's sin-weighted accuracy at fast-path cost. manifold="planar" (the default) intersects in the raw coordinate space.
  • Antimeridian support — structured lat/lon grids whose longitude spans the dateline are aligned automatically (per-value wrap for 1D rectilinear coords; uniform shift for 2D curvilinear). For polygon input, from_polygons(..., periodic=True) unwraps rings that cross ±180°.

Internals:

  • ConservativeRegridder class — reusable regridder; builds the sparse area-intersection matrix once in __init__, applies to many fields. .T / .transpose() returns the backward regridder, reusing any already-materialized weight matrices.
  • Geometry-backend registrymanifold selects the cell-builder ("planar", "cea"); new backends (e.g. true great-circle "s2") plug in with a single registry insert.
  • .to_netcdf / .from_netcdf — persist the weight matrix plus reproducibility metadata (dim names, shapes, grid ranges, manifold, xarray-regrid version, timestamp, schema version) in root attributes, with source / target coord-only groups alongside. Schema-versioned for forward compat.
  • Accessor — exposed via .regrid.conservative_2d(...) on xarray DataArrays / Datasets.
  • Optional extrapip install xarray-regrid[conservative-2d] pulls shapely>=2.0, sparse, and h5netcdf. Existing users unaffected.

Motivation

.conservative uses Stephan Hoyer's axis-factored 1D overlap — fast and elegant but strictly rectilinear (1D-separable). Users with curvilinear ocean models, unstructured climate meshes, or grid-to-region aggregation currently reach for xESMF, which:

  • Requires ESMPy (C/Fortran via conda-forge only; no Windows support)
  • Doesn't support dask.distributed
  • Is a heavy dep cascade

.conservative_2d fills the gap with pure shapely + sparse. Works on Windows, composes with distributed dask, installs with a single pip install shapely.

Naming follows xESMF's short-method-name convention. "_2d" distinguishes full 2D cell-polygon intersection from the 1D axis-factored approach and doesn't prejudge whether the grid is curvilinear or unstructured.

Design notes

  • Rectilinear fast path: when both grids are axis-aligned rectangles, skip GEOS clipping entirely and compute intersection areas analytically from the bounds. ~30× faster than the GEOS path at 1M cells.
  • STRtree candidate filtering: the structured path queries by bounding-box overlap only (no GEOS intersects predicate) and relies on the subsequent area > 0 filter to drop bbox false-positives — much cheaper for tight quad cells. from_polygons defaults to predicate_filter=True (GEOS predicate on), since arbitrary user polygons often have loose bounding boxes; pass predicate_filter=False for tight-bbox inputs.
  • Threaded GEOS: for curvilinear grids, shapely 2's GIL-free shapely.intersection over candidate pairs is parallelized across a ThreadPoolExecutor (above ~1k pairs). ~3–4× on 8 cores.
  • Sparse weight matrix: stored as sparse.COO (dense fallback if sparse is absent); apply goes through sparse.matmul inside xr.apply_ufunc(dask="parallelized"). A cached, pre-sorted/transposed apply matrix avoids a per-call re-sort.
  • NaN semantics match existing .conservative: two-pass value + mask matmul, nan_threshold with the same interpretation (reuses get_valid_threshold).
  • Coverage mask applied regardless of skipna, so uncovered target cells (domain boundaries, polygon holes) always produce NaN.
  • Dtype preserved: float32 in → float32 out. sparse matmul promotes to float64 internally; cast back in _apply_core.

Test plan

  • Structured rectilinear: conservative_2d exactly reproduces .conservative (planar) to 1e-12.
  • Dask + chunked input: identical results to non-dask; spatial chunks rechunked automatically.
  • Curvilinear target (rotated 2D): finite values, transpose round-trip lands back on source dims.
  • Unstructured mesh: from_polygons mass conservation to 1e-12 rel.
  • NaN handling: strict vs lenient thresholds produce the expected ordering.
  • .T transpose on rectilinear and curvilinear; constant field round-trip exact on aligned grids.
  • manifold="cea": matches the factored sin-weighted path on lat/lon grids to within grid quadrature, and conserves the spherical integral ≥10× better than planar.
  • Antimeridian: constant field through a dateline-crossing structured grid is preserved exactly; cross-convention [0,360][-180,180] alignment (both directions); from_polygons(periodic=True) routes samples across the seam.
  • Auxiliary target coords (non-dim + scalar) attached on from_polygons output.
  • netCDF save/load: bit-identical output on reload; manifold preserved; schema mismatch rejected.
  • Dtype preservation: float32/64 round-trip; integer input promotes.
  • Empty target / polygon holes → NaN (regression-guarded).
  • Invalid manifold / out-of-range nan_threshold / non-1D polygon arrays rejected.
  • 35-test test_conservative_2d.py module passing on this branch.

Docs

Three demo notebooks under docs/notebooks/demos/, committed with executed outputs (the docs build runs nb_execution_mode="off", so outputs render only if committed — matching the existing demos), and wired into the notebooks toctree plus the index / quickstart method lists:

  • demo_conservative_2d_curvilinear.ipynb — rotated 2D target; conservation check via the public source_coverage_areas / target_areas diagnostics.
  • demo_conservative_2d_regions.ipynb — air-temperature tutorial dataset aggregated onto US states (dissolved geodatasets.ncovr); map + ranked bar chart, conservation check, save/reload, seasonal reuse.
  • demo_conservative_2d_unstructured.ipynb — Voronoi mesh via from_polygons + .to_netcdf / .from_netcdf.

CHANGELOG updated under ## Unreleased.

Follow-ups (out of scope for this PR)

  • Great-circle (s2) geometry: drops into the geometry-backend registry as manifold="s2" backed by the optional spherely dependency.
  • Grid-to-region aggregation (like xagg): natural extension of from_polygons, but wants a dedicated API surface for loading region polygons and a DataArray-aware flatten helper so callers don't hand-ravel() to match polygon order.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title Add polygon-based conservative regridder Add conservative_2d: conservative regrid for grids that aren't 1D-separable Apr 20, 2026
@BSchilperoort
Copy link
Copy Markdown
Contributor

Hi Timothy,

Thanks for contributing. Nice to see that it was possible to add regridding for non-rectilinear grids. I had thought before of using shapely and intersecting polygons, however I could not get it to perform well enough (didn't know about STRtree!).

I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow.

The LLM also thinks that polygons crossing the antimeridian is a "fringe case", I would disagree 😅

@thodson-usgs
Copy link
Copy Markdown
Author

Honestly, I haven't done much yet :), but I thought a draft PR would be a good way to get the ball rolling.

The bot also identified a few other small bugfixes and upstream optimizations that I'm working through. As those progress, I'm hoping we can pair on this PR, though I want to be respectful of your time. Cloud friendly regridding has long been on our wishlist at USGS (and other groups), so we were excited when xarray-regrid appeared.
@kjdoore tested the package for us awhile back. Unfortunately, he went on to better things, but now regridding might be within the capabilities of the current LLMs...with some steering.

Even if it's never merged, I think this will be a useful datapoint on a longstanding problem.

@thodson-usgs
Copy link
Copy Markdown
Author

thodson-usgs commented Apr 21, 2026

I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow.

Ah, this might be in part because I fed it a Julia implementation for reference (which should be acknowledged). I'll see if the LLM can clean things up, before I make a pass.

@BSchilperoort
Copy link
Copy Markdown
Contributor

BSchilperoort commented Apr 21, 2026

Even if it's never merged, I think this will be a useful datapoint on a longstanding problem.

Yeah it's a good starting point and at a basic level it seems to perform well. So as a source of inspiration it seems promising.

@slevang
Copy link
Copy Markdown
Collaborator

slevang commented Apr 21, 2026

Nice, a more generalized conservative regridding approach that bypasses ESMF would be a great addition. Since xESMF is the current way to achieve this in python, adding some direct comparisons there would be nice as we've done with some of the other methods.

Broadly agree with @BSchilperoort though. Since this package is very lean and focused as is (only ~2600 lines in src), we should be cautious about dumping in a bunch of code without consideration for maintainability.

@thodson-usgs
Copy link
Copy Markdown
Author

Thanks @slevang,
Right, I forgot ambout xESMF. I will investigate it as well. If this particular PR is out of scope, I'm fine closing it. If it leads to improvements in one or both packages, great. I also expect this exercise will identify some simple upstream performance improvements as claude works through and stress tests and profiles different pieces of the stack.

@slevang
Copy link
Copy Markdown
Collaborator

slevang commented Apr 21, 2026

I don't think it is out of scope, and certainly don't mean to discourage you! Only that we should make sure some humans understand the code and agree that it is clean and maintainable before merging.

The current focus of this package is basically "tricks for rectilinear grids" since those can be highly optimized. But I don't see why we can't expand to cover optimizations for other regridding problems, as long as it all stays semi-coherent. For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here.

@thodson-usgs thodson-usgs force-pushed the conservative branch 2 times, most recently from 613cdcb to 3ed200b Compare April 27, 2026 15:05
@thodson-usgs
Copy link
Copy Markdown
Author

For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here.

Are you willing to share that code? Even in a rough form?

I propose that we prompt a bot to integrate your optimizations into a new branch. Then write a prompt to benchmark the performance against xESMF. Then a final prompt to profile the code to explain any performance gaps.

This much should all entail minimal human effort. If the results are interesting, we can take it further.

If you prefer not to share the code, I still encourage you to try it.

thodson-usgs and others added 2 commits May 29, 2026 09:31
Add ConservativeRegridder and the .regrid.conservative_2d accessor for
grids the axis-factored .conservative path can't express: curvilinear
(2D lat/lon coords), unstructured meshes (ConservativeRegridder.from_polygons),
and arbitrary grid-to-polygon aggregation. Cell intersections go through
shapely 2 with sparse-COO weight storage, a rectilinear analytic fast
path, threaded GEOS for curvilinear grids, an analytic cylindrical
equal-area manifold="cea" option for spherical areas on lat/lon grids,
antimeridian handling, dtype preservation, and netCDF weight-matrix
save/load. Installable via the optional conservative-2d extra.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add curvilinear, unstructured, and grid->regions demo notebooks with
executed outputs (the docs build runs with nb_execution_mode="off", so
outputs must be committed to render, matching the existing demos), wire
them into the notebooks toctree, and list conservative_2d on the index
and quickstart pages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thodson-usgs and others added 4 commits May 29, 2026 12:23
The apply matrix was stored as sparse.COO, but sparse's `ndarray @ COO`
path uses a numba COO kernel that never converts to CSR (unlike its own
`COO @ COO` and `GCXS @ ndarray` paths), making it 2-18x slower than
scipy's C SpMM. Build the apply matrix once as a scipy CSR (n_dst, n_src)
and evaluate `(W @ flat.T).T`. scipy is already a hard dependency, so this
also covers the dense fallback.

Measured end-to-end (720x360 -> 360x180 rectilinear regrid):
  T=1:  3.2ms  -> 0.40ms  (8x)
  T=50: 32.1ms -> 11.1ms  (2.9x)

Outputs are unchanged: test_conservative_2d still matches .conservative to
1e-12 and conserves mass to 1e-12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Share the regrid-weight dtype policy `np.result_type(np.float32, ...)` via
  `utils.min_weight_dtype`, used by both `conservative.format_weights` and
  `conservative_2d._result_dtype` (one source of truth for the precision floor).
- Extract the thrice-repeated rectilinear-coord-pair test into
  `_is_rectilinear_pair`, so "rectilinear" has one definition.
- Drop a redundant dtype-equality guard before `astype(copy=False)` in
  `_apply_core` (the cast already no-ops when dtypes match).

Pure refactors; no behavior change (106 passed, ruff + mypy unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ative_2d

- README and the docs index still described xarray-regrid as regridding only
  "between rectilinear grids" — broaden to cover conservative_2d's curvilinear,
  unstructured, and grid-to-polygon support; add conservative_2d to the method
  list and document the `conservative-2d` install extra. Fix two typos in the
  index overview ("possibly"/"effiently").
- to_netcdf's docstring said it saves "the weight matrix"; it saves the
  unnormalized area-intersection matrix (self.areas). Make it precise, matching
  the class docstring's deliberate areas-vs-weights distinction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_Grid.bounds is read only by the rectilinear analytic fast-path; curvilinear
and from_polygons grids run the GEOS/STRtree path over the polygons and never
touch bounds. Stop computing shapely.bounds for those grids (now None),
removing a wasted full-array pass at build time that scales with cell count
(e.g. two shapely.bounds sweeps over the source mesh in from_polygons).

No behavior change (106 passed; ruff + mypy unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thodson-usgs added a commit to thodson-usgs/xarray-regrid that referenced this pull request May 30, 2026
Stacks the s2geometry backend onto the rebuilt conservative_2d (PR xarray-contrib#70).
manifold="s2" intersects cells as great-circle polygons on the sphere using
the optional `spherely` package, plugging into the existing _GRID_BUILDERS
registry: an s2 grid carries spherely Geography cell polygons (s2_polys)
alongside planar shapely boxes (used only as the STRtree candidate-pair
bbox filter, since spherely has no spatial index), and
_build_intersection_areas dispatches to spherely.intersection /
spherely.area for s2-vs-s2 pairs (areas in steradians; the Earth radius
cancels through row normalization).

Adds the `spherical` install extra (spherely>=0.1.1), s2 tests (gated on
spherely via importorskip), and a spherical demo notebook wired into the
docs toctree.

Reconciled from the old spherely-integration lineage onto the new xarray-contrib#70
architecture: dropped the superseded conservative_polygon.py module — its
RegridderMetadata is replaced by RegridSpec / _conservative_2d_serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion test

The existing test_cea_conserves_integral asserts cea beats planar by ~10x on a
lon-constant field (a relative accuracy check). Add a strict gold-standard
conservation test: cos^2(lat) * (1.5 + sin(lon)) (known spherical integral
4*pi), regridded with manifold="cea" onto a co-extensive global grid, conserved
to machine precision measured with INDEPENDENT analytic spherical cell areas
(sin-latitude bands x dlon) — not the regridder's own matrix — with the source
integral approaching the known value. A varying field + independent areas +
matched domains is what makes it non-tautological (a self-area check is a
row-normalization identity; a constant field conserves regardless).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants