iscc-bio - ISCC Processing for Bioimage Data

ISCC Processing for Multi-Dimensional Bioimage Data

Generate ISO 24138:2024 International Standard Content Codes (ISCC) for bioimage data across multiple formats using deterministic IMAGEWALK plane traversal.

Project Status

Version 0.1.0

Warning

This package is a proof of concept, and breaking changes may be released at any time.

Overview

iscc-bio bridges bioimage formats with ISCC-CODE processing by implementing the IMAGEWALK specification - a deterministic algorithm for traversing and canonicalizing pixel data from multi-dimensional bioimaging data. This produces consistent, reproducible content identifiers regardless of source format or storage platform.

Documentation: https://bio.iscc.codes

Key Features

Format-Agnostic Hashing: Generate reproducible ISCCs at the level of pixel data across OME-TIFF, OME-Zarr, OMERO, CZI, ND2, LIF, and other formats
IMAGEWALK Implementation: Deterministic Z→C→T plane traversal with canonical byte representation
Multi-Source Support: Process local files (via BioIO), OME-Zarr archives, and OMERO remote servers
Memory Efficient: Lazy loading with Dask for processing large multi-dimensional images
Multi-Scene Processing: Handle complex multi-scene/multi-series bioimage files
Command-Line Tools: CLI commands for code generation

Installation

Basic Installation

# Using uv (recommended)
uv tool install iscc-bio

# Using pip
pip install iscc-bio

Installation with Format Support

# Install with all bioimage reader plugins
uv tool install "iscc-bio[readers]"

# Install with specific format support
uv tool install "iscc-bio[czi,nd2,lif]"

# Install everything (all readers)
uv tool install "iscc-bio[all]"

OMERO Support

OMERO requires platform-specific prebuilt zeroc-ice wheels not available on PyPI. Install separately:

pip install -r requirements-omero.txt

Available Optional Dependencies

readers / all: All BioIO reader plugins (BioFormats, CZI, OME-TIFF, OME-Zarr, ND2, LIF, etc.)
bioformats: BioFormats reader for broad format support
czi, nd2, lif, ome-tiff, ome-zarr-plugin, dv, tifffile: Individual format readers

Quick Start

CLI Commands

Generate Biocode (ISCC-SUM)

# Generate biocode (ISCC-SUM) for bioimage scenes
iscc-bio biocode myimage.ome.tiff

# Works with multiple sources:
iscc-bio biocode local/file.czi           # Local bioimage file
iscc-bio biocode data.zarr                # OME-Zarr/NGFF
iscc-bio biocode --host omero.server.com --iid 123  # OMERO server

# With per-plane simprints for similarity search:
iscc-bio biocode myimage.czi --simprints

Generate Imagecode (Experimental)

# Generate comprehensive fingerprint with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED
iscc-bio imagecode myimage.czi

# Output includes:
# - ISCC-SUM hash over normalized pixel content
# - Representative view extraction (~5 views per scene)
# - ISCC-IMAGE codes for each view
# - ISCC-MIXED global descriptor

Extract Representative Views

# Extract intelligent 2D views for perceptual hashing
iscc-bio views myimage.nd2 --output-dir ./views/

# Extraction strategies:
# - Maximum intensity projections (MIP)
# - Best focus planes
# - Representative sampling
# - Multi-channel composites

IMAGEWALK Specification

IMAGEWALK is a deterministic algorithm for traversing multi-dimensional bioimage data to produce format-agnostic, reproducible hash digests.

Core Principles

Z→C→T Traversal Order: Planes are processed in deterministic order:
- Outermost loop: Z dimension (depth/focal plane)
- Middle loop: C dimension (channel)
- Innermost loop: T dimension (time)
Canonical Byte Representation: Each 2D plane is:
- Flattened in row-major order (Y then X)
- Encoded as big-endian bytes
- Fed to a hash processor
Multi-Scene Independence: Each scene/series is processed separately, producing one hash per scene

Example Traversal

For an image with Z=2, C=3, T=2 dimensions (12 total planes):

Plane 1:  z=0, c=0, t=0    Plane 7:  z=1, c=0, t=0
Plane 2:  z=0, c=0, t=1    Plane 8:  z=1, c=0, t=1
Plane 3:  z=0, c=1, t=0    Plane 9:  z=1, c=1, t=0
Plane 4:  z=0, c=1, t=1    Plane 10: z=1, c=1, t=1
Plane 5:  z=0, c=2, t=0    Plane 11: z=1, c=2, t=0
Plane 6:  z=0, c=2, t=1    Plane 12: z=1, c=2, t=1

Implementation Modules

iw_bioio.py: BioIO-based implementation for local files
iw_ngff.py: OME-NGFF/Zarr implementation using ome-zarr-py
iw_blitz.py: OMERO Blitz implementation for remote servers

All implementations produce identical hashes for identical pixel data, conforming to the IMAGEWALK specification.

Command-Line Interface

`biocode` - Generate Biocode (ISCC-SUM)

Generate biocode (ISCC-SUM) for bioimage scenes:

iscc-bio biocode INPUT [OPTIONS]

Options:
  -s, --source [auto|bioio|omero|zarr]  Data source type
  --simprints                           Generate per-plane simprints
  --host TEXT                           OMERO server hostname
  --iid INTEGER                         OMERO image ID
  --fid INTEGER                         OMERO fileset ID

`imagecode` - Generate Imagecode (Experimental)

Create comprehensive bioimage fingerprints with ISCC-SUM + ISCC-IMAGE + ISCC-MIXED codes:

iscc-bio imagecode INPUT [OPTIONS]

Options:
  -o, --output-dir PATH    Save extracted view PNGs
  -n, --max-views INTEGER  Maximum views per scene (default: 5)

`views` - Extract Representative Views

Extract intelligent 2D views for perceptual hashing:

iscc-bio views INPUT [OPTIONS]

Options:
  -s, --strategies TEXT    View strategies (mip, best_focus, representative, composite)
  -n, --max-views INTEGER  Maximum views to extract (default: 8)
  -o, --output-dir PATH    Directory to save thumbnails
  --host TEXT              OMERO server hostname
  --iid INTEGER            OMERO image ID

`scenes` - Extract Scene Thumbnails

Extract thumbnails from all scenes in a multi-scene file:

iscc-bio scenes INPUT

`thumb` - Extract Thumbnail

Extract a single representative thumbnail from a bioimage:

iscc-bio thumb INPUT

Python API

IMAGEWALK Plane Iteration

from iscc_bio.imagewalk import iter_planes_bioio, iter_planes_ngff
from iscc_bio.imagewalk import iter_planes_blitz_image

# Iterate over planes using BioIO
for plane in iter_planes_bioio("image.czi"):
    print(f"Scene {plane.scene_idx}, Z={plane.z_depth}, "
          f"C={plane.c_channel}, T={plane.t_time}")
    print(f"Shape: {plane.xy_array.shape}, dtype: {plane.xy_array.dtype}")

# Iterate over OME-Zarr planes
for plane in iter_planes_ngff("data.zarr"):
    # Process plane.xy_array (2D numpy array)
    pass

# Iterate over OMERO planes
from omero.gateway import BlitzGateway
conn = BlitzGateway("user", "pass", host="omero.server.com")
conn.connect()
image = conn.getObject("Image", 123)

for plane in iter_planes_blitz_image(conn, image):
    # Process plane.xy_array
    pass
conn.close()

Generate Biocode (ISCC-SUM)

from iscc_bio.biocode import generate_biocode
from iscc_bio.imagewalk import iter_planes_bioio

# Generate biocode for all scenes
planes = iter_planes_bioio("image.lif")
results = generate_biocode(planes)
print(results[0]["iscc_code"])  # ISCC-SUM for first scene

Generate Imagecode (Experimental)

from iscc_bio.imagecode import generate_imagecode, format_output

# Generate comprehensive fingerprints (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
fingerprints = generate_imagecode("image.nd2", max_views=5)

# Format output
output = format_output(fingerprints, "image.nd2")
print(output)

Supported Formats

Via BioIO plugin ecosystem:

OME-TIFF/TIFF: Multi-page TIFF with OME-XML metadata
OME-Zarr/NGFF: Next-generation file format
OMERO: Remote server access via Blitz gateway
CZI: Carl Zeiss Image format
ND2: Nikon NIS-Elements format
LIF: Leica Image File format
DV: DeltaVision format
BioFormats: 150+ formats via Bio-Formats Java library

Development

Setup Development Environment

# Clone repository
git clone https://github.com/bio-codes/iscc-bio.git
cd iscc-bio

# Install with all dependencies
uv sync --extra all

# Run CLI during development
uv run iscc-bio --help

Development Commands

# Run tests
uv run pytest

This project uses poethepoet for task automation:

# Format markdown files
uv run poe format-md

# Format code files
uv run poe format-code

# Build documentation
uv run poe docs-build

# Run all formatting and docs
uv run poe all

Architecture

Core Modules

iscc_bio.api: High-level Python API — biocode() entry point for all sources
iscc_bio.imagewalk: IMAGEWALK plane traversal implementations
- iw_bioio.py: BioIO implementation
- iw_ngff.py: OME-Zarr/NGFF implementation
- iw_blitz.py: OMERO Blitz implementation
- common.py: Plane data model and canonical byte conversion
iscc_bio.biocode: Biocode (ISCC-SUM) generation from IMAGEWALK plane iterators
iscc_bio.imagecode: Imagecode generation (ISCC-SUM + ISCC-IMAGE + ISCC-MIXED)
iscc_bio.views: Intelligent view extraction strategies
iscc_bio.cli: Command-line interface (thin wrapper around the Python API)

Design Principles

Lazy Loading: Uses Dask arrays for memory-efficient processing of large images
Format Agnostic: Identical processing logic across all formats via IMAGEWALK
Deterministic: Reproducible hashes across platforms and formats
Modular: Clean separation between traversal, canonicalization, and hashing

Funding

This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.

See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).

License

Apache License 2.0 - See LICENSE file for details.

Citation

If you use iscc-bio in your research, please cite:

@software{iscc_bio,
  title        = {bio-codes/iscc-bio: ISCC Processing for Bioimage Data},
  author       = {Pan, Titusz},
  year         = 2025,
  url          = {https://github.com/bio-codes/iscc-bio},
  note         = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
  version      = {0.1.0}
}

Related Projects

iscc-sum - Fast ISCC Data-Code and Instance-Code hashing
iscc-core - ISCC Core Algorithms - Reference implementation
iscc-lib - ISCC Core Algorithms - Polyglot Rust implementation
BioIO - Bioimage reading library
OME-Zarr - Next-generation file format implementation

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
docs		docs
iscc_bio		iscc_bio
scripts		scripts
tests		tests
viz		viz
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-omero.txt		requirements-omero.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

iscc-bio - ISCC Processing for Bioimage Data

Project Status

Overview

Key Features

Installation

Basic Installation

Installation with Format Support

OMERO Support

Available Optional Dependencies

Quick Start

CLI Commands

Generate Biocode (ISCC-SUM)

Generate Imagecode (Experimental)

Extract Representative Views

IMAGEWALK Specification

Core Principles

Example Traversal

Implementation Modules

Command-Line Interface

biocode - Generate Biocode (ISCC-SUM)

imagecode - Generate Imagecode (Experimental)

views - Extract Representative Views

scenes - Extract Scene Thumbnails

thumb - Extract Thumbnail

Python API

IMAGEWALK Plane Iteration

Generate Biocode (ISCC-SUM)

Generate Imagecode (Experimental)

Supported Formats

Development

Setup Development Environment

Development Commands

Architecture

Core Modules

Design Principles

Funding

License

Citation

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`biocode` - Generate Biocode (ISCC-SUM)

`imagecode` - Generate Imagecode (Experimental)

`views` - Extract Representative Views

`scenes` - Extract Scene Thumbnails

`thumb` - Extract Thumbnail

Packages