Skip to content

Add --compact option to reduce array dimensionality#279

Merged
sbesson merged 5 commits intoglencoesoftware:masterfrom
melissalinkert:compact-dimensions
Aug 27, 2025
Merged

Add --compact option to reduce array dimensionality#279
sbesson merged 5 commits intoglencoesoftware:masterfrom
melissalinkert:compact-dimensions

Conversation

@melissalinkert
Copy link
Member

The default behavior should be unchanged, but using the --compact option should eliminate any dimension with length 1. As indicated in the test, something like bioformats2raw --compact "test&sizeZ=3.fake" test.zarr should end up with a 3D array, instead of a 5D array.

I've done some testing locally, but more testing in particular with non-fake data is in order. Maybe also worth thinking about whether X and Y should always be included, or if there are any other restrictions that should be put on this option.

cc @mabruce @erindiel @joshmoore @dominikl @jburel

@sbesson
Copy link
Member

sbesson commented Aug 15, 2025

Started to test this in the context of glencoesoftware/omero-zarr-pixel-buffer#28 (comment). The shape of the arrays generated with/without --compact are certainly compatible with my expectations.

$ for i in $(ls -1); do grep shape $i/0/*/.zarray; done
XYCT_512x512x5x20.zarr/0/0/.zarray:  "shape" : [ 20, 5, 512, 512 ],
XYCT_512x512x5x20.zarr/0/1/.zarray:  "shape" : [ 20, 5, 256, 256 ],
XYC_512x512x5.zarr/0/0/.zarray:  "shape" : [ 5, 512, 512 ],
XYC_512x512x5.zarr/0/1/.zarray:  "shape" : [ 5, 256, 256 ],
XYC_8192x8192x3.zarr/0/0/.zarray:  "shape" : [ 3, 8192, 8192 ],
XYC_8192x8192x3.zarr/0/1/.zarray:  "shape" : [ 3, 4096, 4096 ],
XYC_8192x8192x3.zarr/0/2/.zarray:  "shape" : [ 3, 2048, 2048 ],
XYC_8192x8192x3.zarr/0/3/.zarray:  "shape" : [ 3, 1024, 1024 ],
XYC_8192x8192x3.zarr/0/4/.zarray:  "shape" : [ 3, 512, 512 ],
XYC_8192x8192x3.zarr/0/5/.zarray:  "shape" : [ 3, 256, 256 ],
XYT_512x512x20.zarr/0/0/.zarray:  "shape" : [ 20, 512, 512 ],
XYT_512x512x20.zarr/0/1/.zarray:  "shape" : [ 20, 256, 256 ],
XYZCT_512x512x1x1x1.zarr/0/0/.zarray:  "shape" : [ 1, 1, 1, 512, 512 ],
XYZCT_512x512x1x1x1.zarr/0/1/.zarray:  "shape" : [ 1, 1, 1, 256, 256 ],
XYZCT_512x512x1x1x20.zarr/0/0/.zarray:  "shape" : [ 20, 1, 1, 512, 512 ],
XYZCT_512x512x1x1x20.zarr/0/1/.zarray:  "shape" : [ 20, 1, 1, 256, 256 ],
XYZCT_512x512x1x5x1.zarr/0/0/.zarray:  "shape" : [ 1, 5, 1, 512, 512 ],
XYZCT_512x512x1x5x1.zarr/0/1/.zarray:  "shape" : [ 1, 5, 1, 256, 256 ],
XYZCT_512x512x1x5x20.zarr/0/0/.zarray:  "shape" : [ 20, 5, 1, 512, 512 ],
XYZCT_512x512x1x5x20.zarr/0/1/.zarray:  "shape" : [ 20, 5, 1, 256, 256 ],
XYZCT_512x512x50x1x1.zarr/0/0/.zarray:  "shape" : [ 1, 1, 50, 512, 512 ],
XYZCT_512x512x50x1x1.zarr/0/1/.zarray:  "shape" : [ 1, 1, 50, 256, 256 ],
XYZCT_512x512x50x1x20.zarr/0/0/.zarray:  "shape" : [ 20, 1, 50, 512, 512 ],
XYZCT_512x512x50x1x20.zarr/0/1/.zarray:  "shape" : [ 20, 1, 50, 256, 256 ],
XYZCT_512x512x50x5x1.zarr/0/0/.zarray:  "shape" : [ 1, 5, 50, 512, 512 ],
XYZCT_512x512x50x5x1.zarr/0/1/.zarray:  "shape" : [ 1, 5, 50, 256, 256 ],
XYZCT_512x512x50x5x20.zarr/0/0/.zarray:  "shape" : [ 20, 5, 50, 512, 512 ],
XYZCT_512x512x50x5x20.zarr/0/1/.zarray:  "shape" : [ 20, 5, 50, 256, 256 ],
XYZCT_512x512x50x5x20_2.zarr/0/0/.zarray:  "shape" : [ 20, 5, 50, 512, 512 ],
XYZCT_512x512x50x5x20_2.zarr/0/1/.zarray:  "shape" : [ 20, 5, 50, 256, 256 ],
XYZCT_8192x8192x1x3x1.zarr/0/0/.zarray:  "shape" : [ 1, 3, 1, 8192, 8192 ],
XYZCT_8192x8192x1x3x1.zarr/0/1/.zarray:  "shape" : [ 1, 3, 1, 4096, 4096 ],
XYZCT_8192x8192x1x3x1.zarr/0/2/.zarray:  "shape" : [ 1, 3, 1, 2048, 2048 ],
XYZCT_8192x8192x1x3x1.zarr/0/3/.zarray:  "shape" : [ 1, 3, 1, 1024, 1024 ],
XYZCT_8192x8192x1x3x1.zarr/0/4/.zarray:  "shape" : [ 1, 3, 1, 512, 512 ],
XYZCT_8192x8192x1x3x1.zarr/0/5/.zarray:  "shape" : [ 1, 3, 1, 256, 256 ],
XYZC_512x512x50x5.zarr/0/0/.zarray:  "shape" : [ 5, 50, 512, 512 ],
XYZC_512x512x50x5.zarr/0/1/.zarray:  "shape" : [ 5, 50, 256, 256 ],
XYZT_512x512x50x20.zarr/0/0/.zarray:  "shape" : [ 20, 50, 512, 512 ],
XYZT_512x512x50x20.zarr/0/1/.zarray:  "shape" : [ 20, 50, 256, 256 ],
XYZ_512x512x50.zarr/0/0/.zarray:  "shape" : [ 50, 512, 512 ],
XYZ_512x512x50.zarr/0/1/.zarray:  "shape" : [ 50, 256, 256 ],
XY_512x512.zarr/0/0/.zarray:  "shape" : [ 512, 512 ],
XY_512x512.zarr/0/1/.zarray:  "shape" : [ 256, 256 ],

Although it's an edge case for the kind of data this converter is typically with, the NGFF 0.4 specification expects that [T]he "axes" MUST contain 2 or 3 entries of "type:space" so I would vote for putting a restriction on the reduction of the X and Y axes.

@sbesson sbesson linked an issue Aug 18, 2025 that may be closed by this pull request
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been extensively tested in the context of glencoesoftware/omero-zarr-pixel-buffer#28 (review) using both synthetic sample files of all supported dimensions orders that are subsets of XYZCT as well as real-world datasets in various modalities (high-content screening, brightfield and fluorescence whole slide imaging, fluorescence 3D imaging, segmentations).

In all scenarios, data was generated with the correct hierarchy and metadata both at the multiscales group level as well as at the individual array level. Validation through the OMERO application was consistent with and without the --compact option.

Only code request is whether the unit tests could be extended to cover more scenarios, at least the 2D (yx) and one 4D scenario so that we can be defensive again regressions.

From the command-line perspective, I think --compact is a good option. The only other one I was considering was --reduce or even --reduce-dimensions.
Are Zarr datasets generated with reduced dimensions expected to remain compatible with raw2ometiff (probably assuming corresponding appropriate changes)?

@jburel
Copy link
Contributor

jburel commented Aug 18, 2025

If I get things right, we consider the base to be yx, i.e. not handling cases like yz, or xz.
Slice viewers could potentially saved in that format but this is probably an edge case.

@sbesson
Copy link
Member

sbesson commented Aug 18, 2025

Just tested and 2D planes which spatial axes are not strictle yx are also supported

sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1&sizeY=512&sizeZ=512.fake" test.zarr --compact
sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % cat test.zarr/0/.zattrs 
{
  "multiscales" : [ {
    "metadata" : {
      "method" : "loci.common.image.SimpleImageScaler",
      "version" : "Bio-Formats 8.4.0-SNAPSHOT"
    },
    "axes" : [ {
      "name" : "z",
      "type" : "space"
    }, {
      "name" : "y",
      "type" : "space"
    } ],
    "name" : "test",
    "datasets" : [ {
      "path" : "0",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "1",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 2.0 ],
        "type" : "scale"
      } ]
    } ],
    "version" : "0.4"
  } ],
  "omero" : {
    "channels" : [ {
      "color" : "808080",
      "coefficient" : 1,
      "active" : true,
      "label" : "Channel 0",
      "window" : {
        "min" : 0.0,
        "max" : 0.0,
        "start" : 0.0,
        "end" : 0.0
      },
      "family" : "linear",
      "inverted" : false
    } ],
    "rdefs" : {
      "defaultT" : 0,
      "model" : "greyscale",
      "defaultZ" : 256
    }
  }
}% 
sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % cat test.zarr/0/0/.zarray 
{
  "chunks" : [ 1, 512 ],
  "compressor" : {
    "clevel" : 5,
    "blocksize" : 0,
    "shuffle" : 1,
    "cname" : "lz4",
    "id" : "blosc"
  },
  "dtype" : "|u1",
  "fill_value" : 0,
  "filters" : null,
  "order" : "C",
  "shape" : [ 512, 512 ],
  "dimension_separator" : "/",
  "zarr_format" : 2
}%       

Incidentally, ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1&sizeY=512.fake" test2.zarr --compact will also work and have only one dimension. Inline with the specification, we could probably add a check to enforce there are at least 2 spatial axes

@jburel
Copy link
Contributor

jburel commented Aug 18, 2025

Interestingly
running
./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeY=512&sizeZ=512.fake" test.zarr --compact
generate

 cat test.zarr/0/.zattrs                                                                             
{
  "multiscales" : [ {
    "metadata" : {
      "method" : "loci.common.image.SimpleImageScaler",
      "version" : "Bio-Formats 8.3.0"
    },
    "axes" : [ {
      "name" : "z",
      "type" : "space"
    }, {
      "name" : "y",
      "type" : "space"
    }, {
      "name" : "x",
      "type" : "space"
    } ],
    "name" : "test",
    "datasets" : [ {
      "path" : "0",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "1",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 2.0, 2.0 ],
        "type" : "scale"
      } ]
    } ],
    "version" : "0.4"
  } ],
  "omero" : {
    "channels" : [ {
      "color" : "808080",
      "coefficient" : 1,
      "active" : true,
      "label" : "Channel 0",
      "window" : {
        "min" : 0.0,
        "max" : 255.0,
        "start" : 0.0,
        "end" : 255.0
      },
      "family" : "linear",
      "inverted" : false
    } ],
    "rdefs" : {
      "defaultT" : 0,
      "model" : "greyscale",
      "defaultZ" : 256
    }
  }
}

but

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1&sizeY=512&sizeZ=512.fake" test.zarr --compact

generates

cat test.zarr/0/.zattrs                                                                                     
{
  "multiscales" : [ {
    "metadata" : {
      "method" : "loci.common.image.SimpleImageScaler",
      "version" : "Bio-Formats 8.3.0"
    },
    "axes" : [ {
      "name" : "z",
      "type" : "space"
    }, {
      "name" : "y",
      "type" : "space"
    } ],
    "name" : "test",
    "datasets" : [ {
      "path" : "0",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "1",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 2.0 ],
        "type" : "scale"
      } ]
    } ],
    "version" : "0.4"
  } ],
  "omero" : {
    "channels" : [ {
      "color" : "808080",
      "coefficient" : 1,
      "active" : true,
      "label" : "Channel 0",
      "window" : {
        "min" : 0.0,
        "max" : 0.0,
        "start" : 0.0,
        "end" : 0.0
      },
      "family" : "linear",
      "inverted" : false
    } ],
    "rdefs" : {
      "defaultT" : 0,
      "model" : "greyscale",
      "defaultZ" : 256
    }
  }

@jburel
Copy link
Contributor

jburel commented Aug 18, 2025

This is the excepted behaviour since if sizeX is not specified it defaults to 512

@sbesson
Copy link
Member

sbesson commented Aug 18, 2025

That makes sense. For the Bio-Formats synthetic images, sizeX and sizeY are 512 by default (i.e. if unspecified in the filename/INI file) and all other dimensions are set to 1 by default - https://bio-formats.readthedocs.io/en/latest/developers/generating-test-images.html#key-value-pairs

@melissalinkert melissalinkert requested a review from sbesson August 18, 2025 22:46
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest changes, the following command (and any input leading to less than 2 spatial dimensions) now errors with a meaningful message

sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeY=1.fake" test.zarr
sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeY=1.fake" test_compact.zarr --compact
2025-08-19 08:52:39,399 [main] ERROR c.g.bioformats2raw.Converter - Error while writing series 0
java.lang.IllegalArgumentException: Found 1 spatial dimensions, try again without --compact

The tests cover all scenarios I can think of at present. The README should probably be updated with some documentation around the --compact functionality but this can be done in a follow-up PR.
@jburel anything else from your side?

@jburel
Copy link
Contributor

jburel commented Aug 19, 2025

I tested the following

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeZ=1.fake" test.zarr --compact 
(gradle6) jmarie@MacBookPro bioformats2raw % cat test.zarr/0/.zattrs                                                                 
{
  "multiscales" : [ {
    "metadata" : {
      "method" : "loci.common.image.SimpleImageScaler",
      "version" : "Bio-Formats 8.3.0"
    },
    "axes" : [ {
      "name" : "y",
      "type" : "space"
    }, {
      "name" : "x",
      "type" : "space"
    } ],
    "name" : "test",
    "datasets" : [ {
      "path" : "0",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "1",
      "coordinateTransformations" : [ {
        "scale" : [ 2.0, 2.0 ],
        "type" : "scale"
      } ]
    } ],
    "version" : "0.4"
  } ],
  "omero" : {
    "channels" : [ {
      "color" : "808080",
      "coefficient" : 1,
      "active" : true,
      "label" : "Channel 0",
      "window" : {
        "min" : 0.0,
        "max" : 255.0,
        "start" : 0.0,
        "end" : 255.0
      },
      "family" : "linear",
      "inverted" : false
    } ],
    "rdefs" : {
      "defaultT" : 0,
      "model" : "greyscale",
      "defaultZ" : 0
    }
  }
}                                     

Since z is "spatial" dimension, so to unify things, we should probably have when running

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1.fake" test.zarr --compact
  "axes" : [ {
      "name" : "y",
      "type" : "space"
    }, {
      "name" : "z",
      "type" : "space"
    } ],

or fail when sizeZ=1

@sbesson
Copy link
Member

sbesson commented Aug 19, 2025

I am not sure I follow. For me

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1.fake" test.zarr --compact

fails with Found 1 spatial dimensions, try again without --compact which is expected as the above is equivalent to

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeX=1&sizeY=512&sizeZ=1&sizeC=1&sizeT=1.fake" test.zarr --compact

@jburel
Copy link
Contributor

jburel commented Aug 19, 2025

./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeZ=1.fake" test.zarr --compact
generates a zarr with x, y as axis but X, Y and Z are "spatial" dimensions cf. the spec
This means that one spatial dimension i.e. Z is handled differently

@sbesson
Copy link
Member

sbesson commented Aug 19, 2025

Below is the outcome of bioformats2raw with the --compact option using various combination of sizes alongside the three spatial dimensions

Fake file pass/fail axes
test&sizeX=512&sizeY=512&sizeZ=512.zarr pass zyx
test&sizeX=512&sizeY=512&sizeZ=1.zarr pass yx
test&sizeX=512&sizeY=1&sizeZ=512.zarr pass zx
test&sizeX=1&sizeY=512&sizeZ=512.zarr pass zy
test&sizeX=1&sizeY=1&sizeZ=512.zarr fail
test&sizeX=1&sizeY=512&sizeZ=1.zarr fail
test&sizeX=512&sizeY=1&sizeZ=1.zarr fail
test&sizeX=1&sizeY=1&sizeZ=1.zarr fail

From the above, aren't all spatial dimensions treated the same way?

@jburel
Copy link
Contributor

jburel commented Aug 19, 2025

no because if you don't specify sizeX and sizeY, i.e. test&sizeZ=1.zarr works i.e. sizeZ is ignored and sizeX and sizeY default to 512.
I think you need to add to the above table
test&sizeZ=1.zarr pass axes yx

but if you only specify test&sizeX=1.zarr it fails.
This is an edge case because default sizeZ is 1,
In order to have the 3 spatial dimensions handled in the exact same way, the default value for sizeZ needs to be modified. But that might lead to side effects

@sbesson
Copy link
Member

sbesson commented Aug 19, 2025

Thanks, I understand now. Indeed, as mentioned in the documentation, the default values of the spatial dimension sizes in FakeReader are asymmetric. By construction, a fake file with no key e.g. test.fake will return an isotropic 2D plane and not an isotropic 3D volume.

Note this discussion is orthogonal to the changes proposed in this PR. FakeReader was mostly used for testing convenience as it allows to quickyly generate datasets of different dimensionalities but any other supported format could be used instead e.g. OME-XML or OME-TIFF. If the proposal is to change the behavior of FakeReader to treat all spatial dimensions similarly, this would need to be raised at the Bio-Formats level with the caveat it would be a backwards-incompatible change.

@sbesson sbesson merged commit 2e89d0c into glencoesoftware:master Aug 27, 2025
4 checks passed
@melissalinkert melissalinkert added this to the 0.11.0 milestone Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert to yx or cyx

3 participants