Skip to content

WIP: Add s2 manifold to conservative_2d (stacks on #70)#71

Draft
thodson-usgs wants to merge 8 commits into
xarray-contrib:mainfrom
thodson-usgs:spherely-integration
Draft

WIP: Add s2 manifold to conservative_2d (stacks on #70)#71
thodson-usgs wants to merge 8 commits into
xarray-contrib:mainfrom
thodson-usgs:spherely-integration

Conversation

@thodson-usgs
Copy link
Copy Markdown

Summary

Adds a manifold kwarg to ConservativeRegridder / .regrid.conservative_2d with three values:

  • "planar" (default, existing behavior): raw planar shapely intersection
  • "cea" (existing): analytic Lambert cylindrical equal-area projection before planar intersection — correct spherical areas with the planar fast-path cost, straight-line edges in the projected plane
  • "s2" (new): true great-circle edges via the optional spherely package (Google's s2geometry bindings). Full spherical geometry — correct for cells at any latitude or longitude extent.

Install with pip install xarray-regrid[spherical]. spherely ships pre-built wheels on PyPI for macOS / Linux x86_64 / Windows (Python 3.10–3.14), so no C library build needed for typical users. If spherely isn't installed, manifold="s2" raises ImportError with a clear install hint; the rest of the module keeps working.

Stacks on #70

This PR targets main because GitHub doesn't support cross-fork PRs with non-main bases, but the real delta vs #70 is small. Useful views:

Design notes

  • spherely bridging: _Grid gets an optional s2_polys: np.ndarray | None field. When both source and target carry s2_polys, _build_intersection_areas routes through spherely.intersection + spherely.area (both vectorized ufuncs) instead of planar shapely. The shapely STRtree on planar bboxes remains as the candidate-pair filter — necessary because spherely has no spatial index yet (tracked upstream at benbovy/spherely#72). Planar bboxes are a lossless over-estimate of spherical overlap, so the coarse filter is safe.
  • Manifold = Literal["planar", "cea", "s2"] type alias on public signatures; runtime _check_manifold retained for defense on deserialization paths (e.g. loading a netCDF file with a corrupted attribute).
  • Vectorized polygon construction fast-path: _s2_cell_polys checks hasattr(spherely, "polygons") and uses the batched ufunc when available. Upstream PR benbovy/spherely#52 (draft on my fork) adds this — measured ~2× speedup on construction, not blocking.
  • Schema version bump 1 → 2 on the persisted netCDF format (since spherical: bool became manifold: str on RegridderMetadata). Old files raise a clear schema-mismatch error.

Optional dependency layout

The spherical extra builds on top of conservative_2d:

conservative_2d = ["shapely>=2.0", "h5netcdf"]
spherical = ["xarray-regrid[conservative_2d]", "spherely>=0.1.1"]

Users can install either independently:

  • pip install xarray-regrid[conservative_2d] — 2D regrid without s2
  • pip install xarray-regrid[spherical] — full s2 support (transitively pulls the 2d stack)

Performance

s2 grid construction dominates one-time build cost. Measurements on benbovy/spherely vectorized-polygons branch (not yet released):

cells scalar (0.1.1) vectorized (polygons())
16,200 0.85s 0.56s
1,036,800 ~50s (extrapolated) ~30s

Smaller than my original 10-50× forecast because the s2geometry polygon construction itself is the floor; the vectorized ufunc only eliminates pybind11 per-call overhead. Details in docs/spherical_performance_plan.md (not included in this PR — in the repo as uncommitted scratch).

Apply-path cost is negligible; spherical weight matrix is ~ same size as planar.

Test plan

  • Correctness: manifold="s2" on a constant field roundtrips to the constant at machine precision.
  • Conservation: out · a_dst = A · s to 1e-12 on a random field with the s2 area matrix.
  • Pole-touching cells (outer lat edges at ±90°): the oriented=True s2 path picks the correct cap orientation — regression-tested.
  • Manifold enum rejection: manifold="bogus" raises ValueError at construction and again in _from_state for loaded files.
  • Install hint: manifold="s2" without spherely raises ImportError with the pip install xarray-regrid[spherical] hint.
  • netCDF save/load preserves the manifold choice; manifold="s2" survives roundtrip.
  • 105 tests pass on the branch (100 from conservative + 5 new s2-specific tests).

Docs

Fourth demo notebook docs/notebooks/demos/demo_conservative_2d_spherical.ipynb, structured to match the other three in the conservative_2d family:

  • Intro + three-manifold menu + install hint
  • cos²(lat) source on a 1° grid
  • All three manifolds integrated against analytic truth (8π/3)
  • Spatial s2 − cea diff map on a coarse grid (where great-circle vs straight-line edges diverge most)
  • .to_netcdf / .from_netcdf roundtrip preserving the manifold choice

Follow-ups (out of scope)

  • Curvilinear s2 (2D lat/lon coord arrays): not supported yet. Trivial extension once spherely ships a spatial index (so we can drop the shapely STRtree bridge that currently requires planar bboxes).
  • Antimeridian-crossing rectilinear cells: s2 handles these correctly (the oriented=True orientation is unambiguous), but the shapely planar bbox used for candidate filtering doesn't. A small lon-wrap fix would plug the gap; not critical for typical global grids.
  • Upstream contributions:
    • benbovy/spherely#52 — vectorized spherely.polygons(...) constructor. Draft branch exists; would give ~2× on build.
    • benbovy/spherely#72 — s2 spatial index. Larger contribution; would let us drop the shapely bridge entirely.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs force-pushed the spherely-integration branch 2 times, most recently from 3577fd0 to 79b68ce Compare April 20, 2026 18:51
@thodson-usgs thodson-usgs force-pushed the spherely-integration branch from 79b68ce to c79ae58 Compare April 21, 2026 20:41
@thodson-usgs thodson-usgs changed the title Add s2 manifold to conservative_2d (stacks on #70) WIP: Add s2 manifold to conservative_2d (stacks on #70) Apr 22, 2026
@thodson-usgs thodson-usgs force-pushed the spherely-integration branch from ec830bb to 2aa3c88 Compare April 28, 2026 15:16
thodson-usgs and others added 5 commits May 29, 2026 09:31
Add ConservativeRegridder and the .regrid.conservative_2d accessor for
grids the axis-factored .conservative path can't express: curvilinear
(2D lat/lon coords), unstructured meshes (ConservativeRegridder.from_polygons),
and arbitrary grid-to-polygon aggregation. Cell intersections go through
shapely 2 with sparse-COO weight storage, a rectilinear analytic fast
path, threaded GEOS for curvilinear grids, an analytic cylindrical
equal-area manifold="cea" option for spherical areas on lat/lon grids,
antimeridian handling, dtype preservation, and netCDF weight-matrix
save/load. Installable via the optional conservative-2d extra.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add curvilinear, unstructured, and grid->regions demo notebooks with
executed outputs (the docs build runs with nb_execution_mode="off", so
outputs must be committed to render, matching the existing demos), wire
them into the notebooks toctree, and list conservative_2d on the index
and quickstart pages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The apply matrix was stored as sparse.COO, but sparse's `ndarray @ COO`
path uses a numba COO kernel that never converts to CSR (unlike its own
`COO @ COO` and `GCXS @ ndarray` paths), making it 2-18x slower than
scipy's C SpMM. Build the apply matrix once as a scipy CSR (n_dst, n_src)
and evaluate `(W @ flat.T).T`. scipy is already a hard dependency, so this
also covers the dense fallback.

Measured end-to-end (720x360 -> 360x180 rectilinear regrid):
  T=1:  3.2ms  -> 0.40ms  (8x)
  T=50: 32.1ms -> 11.1ms  (2.9x)

Outputs are unchanged: test_conservative_2d still matches .conservative to
1e-12 and conserves mass to 1e-12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Share the regrid-weight dtype policy `np.result_type(np.float32, ...)` via
  `utils.min_weight_dtype`, used by both `conservative.format_weights` and
  `conservative_2d._result_dtype` (one source of truth for the precision floor).
- Extract the thrice-repeated rectilinear-coord-pair test into
  `_is_rectilinear_pair`, so "rectilinear" has one definition.
- Drop a redundant dtype-equality guard before `astype(copy=False)` in
  `_apply_core` (the cast already no-ops when dtypes match).

Pure refactors; no behavior change (106 passed, ruff + mypy unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacks the s2geometry backend onto the rebuilt conservative_2d (PR xarray-contrib#70).
manifold="s2" intersects cells as great-circle polygons on the sphere using
the optional `spherely` package, plugging into the existing _GRID_BUILDERS
registry: an s2 grid carries spherely Geography cell polygons (s2_polys)
alongside planar shapely boxes (used only as the STRtree candidate-pair
bbox filter, since spherely has no spatial index), and
_build_intersection_areas dispatches to spherely.intersection /
spherely.area for s2-vs-s2 pairs (areas in steradians; the Earth radius
cancels through row normalization).

Adds the `spherical` install extra (spherely>=0.1.1), s2 tests (gated on
spherely via importorskip), and a spherical demo notebook wired into the
docs toctree.

Reconciled from the old spherely-integration lineage onto the new xarray-contrib#70
architecture: dropped the superseded conservative_polygon.py module — its
RegridderMetadata is replaced by RegridSpec / _conservative_2d_serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the spherely-integration branch from 4a9251b to f5c824a Compare May 30, 2026 01:32
thodson-usgs and others added 3 commits May 30, 2026 02:40
A recall-mode code review found manifold="s2" silently produced
hemisphere-sized *complement* polygons whenever lat or lon ran descending
(the CF 90->-90 convention): _s2_cell_polys emitted a fixed corner order
with oriented=True, so descending edges wound the ring clockwise and
spherely took each cell's complement (~4*pi sr instead of the cell). Every
descending-coordinate s2 regrid of a non-constant field was silently wrong;
constant fields and the mass-identity test passed because row-normalization
divides out the uniform complement scale.

Fix: build each cell's shell from per-axis (min, max) corners so the ring is
CCW regardless of coordinate direction (the planar shadow already used the
order-agnostic shapely.box). Covers both the create_polygon fallback and the
future vectorized spherely.polygons branch.

Also from the review:
- tests: a MODULE-level pytest.importorskip("spherely") skipped the ENTIRE
  test_conservative_2d.py (all 35 planar/cea/polygon/netcdf tests) when
  spherely is absent — the default CI env (spherely is only in the
  `spherical` extra). Gate the s2 tests with a per-test skipif marker.
- tests: add test_s2_cell_areas_are_not_complements (areas < 2*pi, sum ~ 4*pi
  for ascending AND descending lat) and
  test_s2_descending_latitude_matches_ascending — the orientation-invariant
  constant-field / mass-identity tests could not catch this class of bug.
- docs: drop the false "_s2_intersection_areas falls back to serial on older
  spherely" claim (there is no version detection); fix the "outer edges land
  exactly on +/-90" test comment; document the known s2 limitation that the
  planar bbox candidate filter is not a conservative superset of a great-circle
  cell (poleward bulge / antimeridian seam) — see the follow-up plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… s2 candidate search (Bug 2)

Replace the manifold dispatch (a _GRID_BUILDERS dict plus a hard-coded
three-way branch in _build_intersection_areas that ALWAYS used the planar bbox
STRtree) with a GeometryBackend strategy class per manifold: PlanarBackend,
CeaBackend (planar in projected space), and S2Backend. Each backend owns grid
construction, candidate-pair search, and the intersection kernel, so the
candidate search stays consistent with the coordinate system its cells live
in. This is the structural fix for the class of bug where s2 was bolted onto
planar infrastructure; a new manifold is now one subclass + one registry entry.

Correctness (Bug 2, found by code review): a planar lon/lat bbox is not a
conservative superset of a great-circle cell, so the shared bbox candidate
filter silently dropped real s2 overlaps at high latitude (geodesic edges
bulge poleward) and across the antimeridian — leaking mass, under-reporting
target_areas, mis-normalizing weights. S2Backend now:
  - builds a bulge-faithful planar shadow whose latitude bounds reach each
    cell's great-circle apex (_s2_shadow_boxes), and
  - queries the candidate STRtree with dst boxes shifted by 0 / ±360° so
    antimeridian-adjacent cells are paired.
The candidate set is now a conservative superset; spherely computes exact
areas and the area>0 filter drops the rest. Verified to machine precision
(3e-16) against a brute-force all-pairs reference at high latitude and across
the seam (previously up to ~8% high-lat / ~95% for a single seam cell of true
overlap was dropped).

Tests: add test_s2_high_latitude_coverage_is_complete and
test_s2_antimeridian_coverage_is_complete vs an independent all-pairs spherely
reference. The refactor is behavior-preserving for planar/cea (114 package
tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d s2 conservation test

/simplify cleanups on the s2 + backend code:
- extract `_resolve_manifold` so the "unknown manifold" message has one source
  of truth (was copy-pasted in `_check_manifold` and `polygons_from_coords`).
- extract `_threaded_pairwise` — the ThreadPoolExecutor skeleton shared verbatim
  by `_intersection_areas_threaded` (GEOS) and `_s2_intersection_areas`
  (spherely); each is now a small kernel + tuning constants.
- drop `predicate_filter` from the `GeometryBackend.area_matrix` ABC (a
  planar-only candidate-search knob); `PlanarBackend` widens it back in and
  `from_polygons` routes through the concretely-typed `_PLANAR` instance.
- add `PlanarBackend.grid_from_polygons` so all planar `_Grid` construction lives
  behind the backend (`from_polygons` no longer hand-builds `_Grid`).
- mark the unreleased vectorized `spherely.polygons` branch `# pragma: no cover`.
- clarify in `_candidate_pairs` that coord-convention reconciliation and the s2
  antimeridian-overlap recovery are deliberately at different layers.

Tests: replace the tautological `test_s2_manifold_conserves_mass` (a row-norm
identity that "conserves" any matrix) with `test_s2_conserves_known_integral` —
a field with a known spherical integral (4*pi), regridded onto a co-extensive
global grid, conserving to machine precision measured with INDEPENDENT
great-circle cell areas, and approaching the analytic value.

Behavior-preserving (114 package tests pass); ruff + mypy unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant