Skip to content

Switch from jzarr to zarr-java for v2/NGFF 0.4#302

Open
melissalinkert wants to merge 3 commits intoglencoesoftware:masterfrom
melissalinkert:zarr-java-v2
Open

Switch from jzarr to zarr-java for v2/NGFF 0.4#302
melissalinkert wants to merge 3 commits intoglencoesoftware:masterfrom
melissalinkert:zarr-java-v2

Conversation

@melissalinkert
Copy link
Member

This fully removes the dependency on jzarr, and uses zarr-java for all versions.

There are a few test failures here, which are specific to v2 plate data. See note in Converter line 2816. Since there have been a few relevant fixes in zarr-java since 0.0.9 (especially zarr-developers/zarr-java#36), I expect to keep this as a draft until a new zarr-java version is released.

There are also a few commented to-do items around compression properties (likely needs documentation in addition to code changes), and testing of group key counts.

@melissalinkert melissalinkert marked this pull request as ready for review February 3, 2026 16:49
@dominikl
Copy link

dominikl commented Feb 4, 2026

zarr-java 0.0.10 has been released now. Are you holding of with a new release for this PR? Would be nice do have a release with the --ngff-version argument for generating v3 zarrs.

@melissalinkert
Copy link
Member Author

Mentioned separately, but note that 0.12.0-rc1 includes the --ngff-version option and can be used to write v3. We're still reviewing this PR; c37e250 updated to zarr-java 0.0.10 as soon as it was available.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted the same set of source samples as used in #290 (review) using bioformats2raw 0.11.0 (i.e. with jzarr) as well as this library.

Conversion times for a range of compression options is given below

Version Options Leica-1.scn NIRHTa+001 LuCa-7color_Scan1.qptiff
0.11.0 --compression null 0m51.015s 0m43.750s 0m44.531s
0.11.0 --compression zlib 1m24.959s 1m20.042s 0m59.944s
0.11.0 --compression blosc 0m55.334s 0m46.729s 0m40.767s
0.11.0 --compact 0m55.951s 0m49.904s 0m39.888s
0.12.0-SNAPSHOT --compression null 1m24.377s 0m50.734s 0m50.381s
0.12.0-SNAPSHOT --compression zlib 1m57.604s 2m10.099s 1m38.594s
0.12.0-SNAPSHOT --compression blosc 1m41.235s 2m12.802s 1m30.536s
0.12.0-SNAPSHOT --compact 1m40.051s 2m9.259s 1m30.725s

and dataset sizes

Version Options Leica-1.scn NIRHTa+001 LuCa-7color_Scan1.qptiff
0.11.0 --compression null 5.5G 4.3G 5.8G
0.11.0 --compression zlib 3.4G 2.4G 2.1G
0.11.0 --compression blosc 4.3G 4.0G 3.2G
0.11.0 --compact 4.3G 4.0G 3.2G
0.12.0-SNAPSHOT --compression null 5.5G 4.3G 5.8G
0.12.0-SNAPSHOT --compression zlib 3.2G 2.4G 2.0G
0.12.0-SNAPSHOT --compression blosc 3.2G 2.3G 1.9G
0.12.0-SNAPSHOT --compact 3.2G 2.3G 1.9G

The compression internal are not strictly identical between jzarr and zarr-java

  • --compression blosc with jzarr

      "compressor" : {
      "clevel" : 5,
      "blocksize" : 0,
      "shuffle" : 1,
      "cname" : "lz4",
      "id" : "blosc"
    },
  • --compression blosc with zarr-java

    "compressor" : {
      "id" : "blosc",
      "cname" : "zstd",
      "shuffle" : 0,
      "clevel" : 5,
      "typesize" : 1,
      "blocksize" : 0
    },
    
  • --compression zlib with jzarr

    "compressor" : {
     "level" : 1,
     "id" : "zlib"
    },
  • --compression zlib with zarr-java

    "compressor" : {
      "id" : "zlib",
      "level" : 5
    },

Independently of the above, the results above suggest a performance degradation when using zarr-java especially when compression is enabled. This is consistent with the results observed in the review of #290

@melissalinkert
Copy link
Member Author

ca328fe switches to the default compression settings that jzarr used (https://jzarr.readthedocs.io/en/latest/tutorial.html#compressors), and should now respect compression properties options again. That substantially speeds up the conversion time for LuCa-7color_Scan1.qptiff with zarr-java when testing locally.

@sbesson
Copy link
Member

sbesson commented Feb 13, 2026

Thanks @melissalinkert, with the latest commit, the size and conversion times are now very consistent when using 0.11.0 with jzarr and this PR with zarr-java

Version Options Leica-1.scn NIRHTa+001 LuCa-7color_Scan1.qptiff
0.11.0 --compression null 5.5G 4.3G 5.8G
0.11.0 --compression zlib 3.4G 2.4G 2.1G
0.11.0 --compression blosc 4.3G 4.0G 3.2G
0.11.0 --compact 4.3G 4.0G 3.2G
0.12.0-SNAPSHOT --compression null 5.5G 4.3G 5.8G
0.12.0-SNAPSHOT --compression zlib 3.4G 2.4G 2.1G
0.12.0-SNAPSHOT --compression blosc 4.3G 2.5G 3.2G
0.12.0-SNAPSHOT --compact 4.3G 2.5G 3.2G
Version Options Leica-1.scn NIRHTa+001 LuCa-7color_Scan1.qptiff
0.11.0 --compression null 1m2.504s 0m45.412s 0m45.840s
0.11.0 --compression zlib 1m25.445s 1m24.885s 0m59.913s
0.11.0 --compression blosc 1m1.152s 0m48.590s 0m45.504s
0.11.0 --compact 1m0.541s 0m47.545s 0m44.798s
0.12.0-SNAPSHOT --compression null 1m4.281s 0m51.738s 0m50.975s
0.12.0-SNAPSHOT --compression zlib 1m33.890s 1m32.787s 1m11.114s
0.12.0-SNAPSHOT --compression blosc 1m2.873s 0m54.470s 0m45.983s
0.12.0-SNAPSHOT --compact 1m2.827s 0m49.999s 0m46.317s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants