Skip to content

[RFC] Proposal: Architecture Modernization – Native Kernels, Model Hub & Lazy Dependencies #3934

@krkawzq

Description

@krkawzq

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

I propose three architectural enhancements for Scanpy:

1. Native C++ Backends (It is painless now)

I propose moving high-performance kernels to Native C++.

  • Feasibility: In the modern CI/CD era (GitHub Actions + cibuildwheel), cross-platform binary distribution is fully automated.
  • Maintenance: With AI-assisted coding, writing C++ kernels is no longer a burden. We can focus on algorithm design while automating the implementation.
  • Proof: In PerturbLab/kernels, I implemented sparse matrix operators in pure C++ that significantly outperform Numba.

2. Dependency Hygiene: Lazy Imports & Vendoring

  • Lazy Loading: Heavy submodules (especially those requiring torch or specific plotting libs) should use lazy imports.
  • Vendoring: Small utility functions should be "vendored" (inlined) rather than adding full package dependencies.
  • Benefit: This keeps the core lightweight and prevents "dependency hell."

3. A "Transformers-like" Model Hub

I propose adding a standardized sc.models interface.

  • In perturblab/models, I implemented a unified registry to manage, download, and deploy models (e.g., scGPT, Gears) with a consistent API (config, model, io).
  • Scanpy is the ideal place to standardize this for the community.

Alternative Solutions

Continuing to rely solely on Numba/Python for everything limits the potential for extreme optimization and restricts the ecosystem from effectively utilizing low-level hardware acceleration (CUDA/C++).

Additional Context

My repository krkawzq/PerturbLab serves as a proof-of-concept for this architecture. It demonstrates that a strictly typed, high-performance (C++ backed), and modular system can be built rapidly.

I am happy to discuss contributing the C++ kernels or the Model Hub design to help push this initiative forward. "The lower the level, the better the performance."

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions