feat(distributed): MPI/distributed bindings for Ginkgo#105
Draft
rho-novatron wants to merge 10 commits intoHelmholtz-AI-Energy:mainfrom
Draft
feat(distributed): MPI/distributed bindings for Ginkgo#105rho-novatron wants to merge 10 commits intoHelmholtz-AI-Energy:mainfrom
rho-novatron wants to merge 10 commits intoHelmholtz-AI-Energy:mainfrom
Conversation
…nstructors Move the duplicated __cuda_array_interface__ / buffer-protocol conversion logic from array.cpp and matrix.cpp into a single gko_array_from_pyobject<T>() template in utils.hpp. Both the gko::array(Executor, py::object) constructor and the sparse-matrix (Executor, py::object) constructor now delegate to this helper, eliminating ~260 lines of duplicated CUDA/host branching. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
pybind11's `format_descriptor<int64_t>::format()` returns 'q' (long long) but numpy's `np.int64` reports 'l' (long) on x86_64 Linux, while on Windows the relationship reverses for 32-bit ints. Same physical layout, different format chars — `check_buffer_dtype` was rejecting these as 'Incompatible dtypes'. Treat any pair of integer format chars as compatible when both are single-character signed (or both unsigned) and itemsize matches the expected ValueType. This makes Buffer-protocol conversions work uniformly across platforms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rho-novatron
added a commit
to rho-novatron/pyGinkgo
that referenced
this pull request
Apr 18, 2026
These bindings reuse the polymorphic gko::LinOp base, so the same factories transparently accept both single-process matrices and distributed::Matrix. Each solver returns a (logger, x) tuple from apply() so callers can introspect convergence (residual norm, iteration count) -- the standard Convergence logger pattern matching the existing GMRES binding. Addresses NovaPIC PR Helmholtz-AI-Energy#105 audit items 8 (Jacobi preconditioner for distributed) and 9 (solver introspection). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rho-novatron
added a commit
to rho-novatron/pyGinkgo
that referenced
this pull request
Apr 18, 2026
…cations - Vector_<T>.from_local_array_view: zero-copy variant of from_local_array. On a CudaExecutor with a __cuda_array_interface__-backed input, the resulting distributed.Vector aliases the caller's buffer instead of copying. Uses py::keep_alive<0,4> to tie the input lifetime to the returned vector. - Vector_<T>.gather_on_root(root=0): gather a distributed.Vector onto a single rank as a host numpy array; returns None on non-root ranks. Uses MPI gather + gather_v with the local Dense slice as the source. - Matrix_<T,L,G>.create_from_local_and_non_local: clarify in the docstring that recv_connections holds GLOBAL column ids and the non_local_linop's local column index space is the position into recv_connections, ordered by source rank then ascending global id. - Add 2-rank smoke tests for cg/bicgstab on a distributed.Matrix and for the new Vector helpers (view parity + gather correctness). Addresses NovaPIC PR Helmholtz-AI-Energy#105 audit items 5 (off-diag column convention), 13 (zero-copy CuPy <-> distributed.Vector) and 14 (gather to host). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use find_package(Ginkgo) instead of find_package(ginkgo) to match the upstream GinkgoConfig.cmake naming on case-sensitive filesystems - Guard install(IMPORTED_RUNTIME_ARTIFACTS) with if(NOT Ginkgo_FOUND) since bare targets (ginkgo, ginkgo_device, ...) only exist when Ginkgo is built from source via FetchContent, not when using a pre-installed package
The install(DIRECTORY) used an absolute DESTINATION (${Python_SITELIB})
which bypasses CMAKE_INSTALL_PREFIX. Since py-build-cmake uses a staging
directory as the install prefix to collect files for the wheel, the
absolute path wrote files directly to the real site-packages instead of
the staging area, resulting in wheels missing the compiled .so binding.
- Use relative DESTINATION (${PY_BUILD_CMAKE_MODULE_NAME}) so files are
installed under CMAKE_INSTALL_PREFIX where py-build-cmake picks them up
- Add trailing / to source DIRECTORY to install contents, not the
directory itself (avoids pyGinkgo/pyGinkgo/ nesting)
Adds an opt-in MPI/distributed surface to pyGinkgo, gated by a new
`pyGinkgo_BUILD_MPI` CMake option (OFF by default — the serial build
is unchanged). When enabled, the module gains:
pyGinkgo.pyGinkgoBindings.mpi
Communicator wraps gko::experimental::mpi::communicator
around an mpi4py.MPI.Comm (no MPI_Comm_dup)
map_rank_to_device_id, is_gpu_aware
BUILD_MPI_IMPL / BUILD_MPI_LIBRARY_VERSION
runtime_mpi_library_version(), verify_abi(comm)
pyGinkgo.pyGinkgoBindings.distributed
Partition_<L>_<G> build_from_global_size_uniform / contiguous
/ mapping
Vector_<T> create / from_local_array(_deduce_size),
fill, scale, add_scaled, compute_dot,
compute_norm{1,2}, get_local_vector,
shape, local_shape
Matrix_<T>_<L>_<G> create_empty / create_from_local_linop /
create_from_local_and_non_local,
get_(non_)local_matrix, shape
PyLinOp pybind11 trampoline so Python subclasses
can implement matrix-free LinOps; the
alias type is registered correctly so
Python-side overrides of apply_impl are
invoked from Krylov solvers.
pyGinkgo.distributed (Python facade)
Partition / DistributedVector / DistributedMatrix / PyLinOp
plus a lazy MPI-ABI verification (build_impl vs runtime_impl plus
a C++-side MPI_Comm_size round-trip on first use of any entry
point that takes a communicator).
The existing solver bindings (`solver.gmres_<T>`, `solver.direct`,
etc.) accept any `gko::LinOp` polymorphically and therefore work
unchanged with a distributed Matrix or a PyLinOp; no new solver or
preconditioner bindings are introduced in this PR.
Build-time / runtime safety:
* `cmake/FindMpi4py.cmake` locates the active interpreter's mpi4py
headers; mismatches between the MPI mpi4py was built against and
the one CMake selected emit a WARNING.
* `cmake/DetectMpiAbi.cmake` bakes `MPI_Get_library_version()` and
the implementation flavor (MPICH/OpenMPI/IntelMPI) into a generated
header, so the runtime check can give a precise error if mpi4py
loads a different MPI.
* `pyGinkgo.distributed` raises ImportError immediately if the C
extension was built without MPI or if mpi4py is missing; the ABI
round-trip happens lazily on first communicator use.
`pyproject.toml` adds an optional `mpi` extra pulling in mpi4py>=3.1.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds pytest modules exercising pyGinkgo.distributed end-to-end under
mpirun -n >= 2. The shared conftest skips automatically on a
single rank and barriers between tests:
tests/cpp_bindings/distributed/
conftest.py comm/exec/rank/nprocs fixtures
test_communicator.py mpi4py.Comm bridging + ABI verification
test_partition.py Partition.uniform / from_contiguous /
from_mapping
test_vector.py from_local_array, fill, norm, dot,
get_local_vector
test_matrix.py local-only identity CSR + apply
test_pylinop.py Python LinOp subclass override is invoked
test_solver.py distributed GMRES on a block-diagonal SPD
(uses the existing serial GMRES binding,
which dispatches polymorphically)
tests/pyGinkgo/distributed/
test_facade.py high-level Python facade
Verified passing on 2 and 4 ranks (22 tests).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These bindings reuse the polymorphic gko::LinOp base, so the same factories transparently accept both single-process matrices and distributed::Matrix. Each solver returns a (logger, x) tuple from apply() so callers can introspect convergence (residual norm, iteration count) -- the standard Convergence logger pattern matching the existing GMRES binding. Addresses NovaPIC PR Helmholtz-AI-Energy#105 audit items 8 (Jacobi preconditioner for distributed) and 9 (solver introspection). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cations - Vector_<T>.from_local_array_view: zero-copy variant of from_local_array. On a CudaExecutor with a __cuda_array_interface__-backed input, the resulting distributed.Vector aliases the caller's buffer instead of copying. Uses py::keep_alive<0,4> to tie the input lifetime to the returned vector. - Vector_<T>.gather_on_root(root=0): gather a distributed.Vector onto a single rank as a host numpy array; returns None on non-root ranks. Uses MPI gather + gather_v with the local Dense slice as the source. - Matrix_<T,L,G>.create_from_local_and_non_local: clarify in the docstring that recv_connections holds GLOBAL column ids and the non_local_linop's local column index space is the position into recv_connections, ordered by source rank then ascending global id. - Add 2-rank smoke tests for cg/bicgstab on a distributed.Matrix and for the new Vector helpers (view parity + gather correctness). Addresses NovaPIC PR Helmholtz-AI-Energy#105 audit items 5 (off-diag column convention), 13 (zero-copy CuPy <-> distributed.Vector) and 14 (gather to host). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5b8dc67 to
2149843
Compare
… helpers - PyLinOp docstring + README spell out that the apply_impl callback receives distributed.Vector inputs (local block only) and is responsible for halo exchange; recommend cupy stream sync. - README documents the (Convergence, x) tuple returned by *.apply() and the new from_local_array_view / gather_on_root helpers. Addresses NovaPIC PR Helmholtz-AI-Energy#105 audit items 11 (PyLinOp signature) and 12 (stream safety). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
|
I worked on bindings for the distributed backend a while back. Maybe its worth checking this branch as well https://github.com/Helmholtz-AI-Energy/pyGinkgo/tree/dist/mpi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR-B: feat(distributed): MPI/distributed bindings for Ginkgo
Branch (fork):
rho-novatron:rho/distributedBase (upstream):
Helmholtz-AI-Energy:refactor/array-helper(i.e.stacked on PR-A)
Commits: 4
a46f8cfec40b72pyGinkgo.distributedd44080975b0816Summary
Adds first-class Python bindings for Ginkgo's MPI-distributed types,
exposed under the new
pyGinkgo.distributedsubmodule.Build is opt-in: setting
pyGinkgo_BUILD_MPI=ON(defaults toOFF) configures Ginkgo withGINKGO_BUILD_MPI=ONand compiles thenew bindings. Serial builds and the existing public API are
completely unchanged.
What's exposed
gko::experimental::mpi::communicatorpyGinkgo.distributed.Communicatorgko::experimental::distributed::PartitionpyGinkgo.distributed.Partition_int32_int64gko::experimental::distributed::VectorpyGinkgo.distributed.Vector_doubleetc.gko::experimental::distributed::MatrixpyGinkgo.distributed.Matrix_double_int32_int64PyLinOpfor distributed solvespyGinkgo.distributed.PyLinOp_doublempi4pyis the supported way to construct a Communicator fromPython; bindings accept
MPI.Commdirectly.Why no new solver bindings?
Existing solvers (
gmres_double,cg_double, …) already accept anygko::LinOppolymorphically through their generated apply paths. Adistributed
Matrixis aLinOp, so existing bindings dispatchcorrectly without modification — verified in
tests/cpp_bindings/ distributed/test_solver.py.ABI safety
MPI ABI compatibility between Ginkgo (built against MPICH headers in
the conda package) and the user's mpi4py (also built against MPICH)
is checked lazily the first time a Communicator is constructed:
mpich,openmpi,…) is baked into the binding via a CMake-defined macro
PYGINKGO_MPI_IMPL.Communicator.__init__,mpi4py.MPI. Get_library_version()is parsed and compared against the bakedvalue. A round-trip C++ check (broadcast a sentinel value from
rank 0) confirms the two sides actually agree on
MPI_COMM_WORLD.Mismatches raise a clear
RuntimeErrorinstead of segfaulting.What is intentionally deferred
preconditioners are exposed as already-generated
LinOps, butSchwarzfundamentally requires aLinOpFactorybecause itsgenerate(distributed_matrix)step constructs the per-rank localsolver against the local block. Adding factory bindings is a
cross-cutting design change that deserves its own discussion and PR.
Ginkgo logger interface still works.
read_distributedmatrix-market reader. Removed as a stub; canreturn when there's a tested implementation.
Testing
22 tests, run with both 2 and 4 ranks:
mpirun -n 2 python -m pytest \ tests/cpp_bindings/distributed/ tests/pyGinkgo/distributed/ mpirun -n 4 python -m pytest \ tests/cpp_bindings/distributed/ tests/pyGinkgo/distributed/Coverage:
Cross-platform note
The Windows fix in
a46f8cfis necessary becauselongis 4 byteson Windows (vs 8 on Linux/macOS). The
gko_array_from_pyobjecthelper added in PR-A used a hard-coded format-char comparison; the
fix accepts any format char with a matching
sizeof()and signednessso the same Python integer arrays work on all three platforms.
Stacked PR
Stacked on top of PR-A (
refactor/array-helper). Reviewing PR-Afirst is recommended; the bulk of the new code in this PR lives in
src/cpp_bindings/distributed/andsrc/cpp_bindings/mpi/, neitherof which is touched by PR-A.