Switch from jzarr to zarr-java for v2/NGFF 0.4#302
Switch from jzarr to zarr-java for v2/NGFF 0.4#302melissalinkert wants to merge 3 commits intoglencoesoftware:masterfrom
Conversation
|
zarr-java 0.0.10 has been released now. Are you holding of with a new release for this PR? Would be nice do have a release with the |
|
Mentioned separately, but note that 0.12.0-rc1 includes the |
sbesson
left a comment
There was a problem hiding this comment.
Converted the same set of source samples as used in #290 (review) using bioformats2raw 0.11.0 (i.e. with jzarr) as well as this library.
Conversion times for a range of compression options is given below
| Version | Options | Leica-1.scn | NIRHTa+001 | LuCa-7color_Scan1.qptiff |
|---|---|---|---|---|
| 0.11.0 | --compression null | 0m51.015s | 0m43.750s | 0m44.531s |
| 0.11.0 | --compression zlib | 1m24.959s | 1m20.042s | 0m59.944s |
| 0.11.0 | --compression blosc | 0m55.334s | 0m46.729s | 0m40.767s |
| 0.11.0 | --compact | 0m55.951s | 0m49.904s | 0m39.888s |
| 0.12.0-SNAPSHOT | --compression null | 1m24.377s | 0m50.734s | 0m50.381s |
| 0.12.0-SNAPSHOT | --compression zlib | 1m57.604s | 2m10.099s | 1m38.594s |
| 0.12.0-SNAPSHOT | --compression blosc | 1m41.235s | 2m12.802s | 1m30.536s |
| 0.12.0-SNAPSHOT | --compact | 1m40.051s | 2m9.259s | 1m30.725s |
and dataset sizes
| Version | Options | Leica-1.scn | NIRHTa+001 | LuCa-7color_Scan1.qptiff |
|---|---|---|---|---|
| 0.11.0 | --compression null | 5.5G | 4.3G | 5.8G |
| 0.11.0 | --compression zlib | 3.4G | 2.4G | 2.1G |
| 0.11.0 | --compression blosc | 4.3G | 4.0G | 3.2G |
| 0.11.0 | --compact | 4.3G | 4.0G | 3.2G |
| 0.12.0-SNAPSHOT | --compression null | 5.5G | 4.3G | 5.8G |
| 0.12.0-SNAPSHOT | --compression zlib | 3.2G | 2.4G | 2.0G |
| 0.12.0-SNAPSHOT | --compression blosc | 3.2G | 2.3G | 1.9G |
| 0.12.0-SNAPSHOT | --compact | 3.2G | 2.3G | 1.9G |
The compression internal are not strictly identical between jzarr and zarr-java
-
--compression bloscwithjzarr"compressor" : { "clevel" : 5, "blocksize" : 0, "shuffle" : 1, "cname" : "lz4", "id" : "blosc" },
-
--compression bloscwithzarr-java"compressor" : { "id" : "blosc", "cname" : "zstd", "shuffle" : 0, "clevel" : 5, "typesize" : 1, "blocksize" : 0 },
-
--compression zlibwithjzarr"compressor" : { "level" : 1, "id" : "zlib" },
-
--compression zlibwithzarr-java"compressor" : { "id" : "zlib", "level" : 5 },
Independently of the above, the results above suggest a performance degradation when using zarr-java especially when compression is enabled. This is consistent with the results observed in the review of #290
|
ca328fe switches to the default compression settings that jzarr used (https://jzarr.readthedocs.io/en/latest/tutorial.html#compressors), and should now respect compression properties options again. That substantially speeds up the conversion time for |
|
Thanks @melissalinkert, with the latest commit, the size and conversion times are now very consistent when using
|
This fully removes the dependency on jzarr, and uses zarr-java for all versions.
There are a few test failures here, which are specific to v2 plate data. See note in
Converterline 2816. Since there have been a few relevant fixes in zarr-java since 0.0.9 (especially zarr-developers/zarr-java#36), I expect to keep this as a draft until a new zarr-java version is released.There are also a few commented to-do items around compression properties (likely needs documentation in addition to code changes), and testing of group key counts.