Add --compact option to reduce array dimensionality#279
Add --compact option to reduce array dimensionality#279sbesson merged 5 commits intoglencoesoftware:masterfrom
--compact option to reduce array dimensionality#279Conversation
|
Started to test this in the context of glencoesoftware/omero-zarr-pixel-buffer#28 (comment). The shape of the arrays generated with/without Although it's an edge case for the kind of data this converter is typically with, the NGFF 0.4 specification expects that |
sbesson
left a comment
There was a problem hiding this comment.
This PR has been extensively tested in the context of glencoesoftware/omero-zarr-pixel-buffer#28 (review) using both synthetic sample files of all supported dimensions orders that are subsets of XYZCT as well as real-world datasets in various modalities (high-content screening, brightfield and fluorescence whole slide imaging, fluorescence 3D imaging, segmentations).
In all scenarios, data was generated with the correct hierarchy and metadata both at the multiscales group level as well as at the individual array level. Validation through the OMERO application was consistent with and without the --compact option.
Only code request is whether the unit tests could be extended to cover more scenarios, at least the 2D (yx) and one 4D scenario so that we can be defensive again regressions.
From the command-line perspective, I think --compact is a good option. The only other one I was considering was --reduce or even --reduce-dimensions.
Are Zarr datasets generated with reduced dimensions expected to remain compatible with raw2ometiff (probably assuming corresponding appropriate changes)?
|
If I get things right, we consider the base to be |
|
Just tested and 2D planes which spatial axes are not strictle Incidentally, |
|
Interestingly but generates |
|
This is the excepted behaviour since if |
|
That makes sense. For the Bio-Formats synthetic images, |
sbesson
left a comment
There was a problem hiding this comment.
With the latest changes, the following command (and any input leading to less than 2 spatial dimensions) now errors with a meaningful message
sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeY=1.fake" test.zarr
sbesson@Sebastien-GS-MacBook-Pro-2025 bioformats2raw % ./bioformats2raw-0.11.0-SNAPSHOT/bin/bioformats2raw "test&sizeY=1.fake" test_compact.zarr --compact
2025-08-19 08:52:39,399 [main] ERROR c.g.bioformats2raw.Converter - Error while writing series 0
java.lang.IllegalArgumentException: Found 1 spatial dimensions, try again without --compact
The tests cover all scenarios I can think of at present. The README should probably be updated with some documentation around the --compact functionality but this can be done in a follow-up PR.
@jburel anything else from your side?
|
I tested the following Since z is "spatial" dimension, so to unify things, we should probably have when running or fail when sizeZ=1 |
|
I am not sure I follow. For me fails with |
|
|
|
Below is the outcome of
From the above, aren't all spatial dimensions treated the same way? |
|
no because if you don't specify sizeX and sizeY, i.e. test&sizeZ=1.zarr works i.e. sizeZ is ignored and sizeX and sizeY default to 512. but if you only specify test&sizeX=1.zarr it fails. |
|
Thanks, I understand now. Indeed, as mentioned in the documentation, the default values of the spatial dimension sizes in Note this discussion is orthogonal to the changes proposed in this PR. |
The default behavior should be unchanged, but using the
--compactoption should eliminate any dimension with length 1. As indicated in the test, something likebioformats2raw --compact "test&sizeZ=3.fake" test.zarrshould end up with a 3D array, instead of a 5D array.I've done some testing locally, but more testing in particular with non-fake data is in order. Maybe also worth thinking about whether X and Y should always be included, or if there are any other restrictions that should be put on this option.
cc @mabruce @erindiel @joshmoore @dominikl @jburel