A python implementation of the cast_value codec for Zarr,
with zarr-python integration.
The cast_value codec defines how to safely convert arrays between integer
and float data types. In Zarr terminology, this codec is an "array -> array"
codec, which means its input and output are both arrays.
You can find the specification for this codec in the zarr-extensions repository.
This codec is commonly used to for lossy data compression: when decoded data should be high-precision floats, but the absolute range of the values fits within the range of a smaller integer data type, then encoding the floats as ints before writing data can vastly shrink the stored values.
For example, if your data is a sequence of float64 values like
[100.1, 120.3, 125.5], storing those values as uint8, e.g.
[100, 120, 125], offers 8-fold reduction in storage size, provided the
precision lost due to rounding is acceptable.
# import the codec that uses the rust backend
from cast_value import CastValueRustV1
# Create an in-memory zarr array with float64 dtype, stored as uint8.
# The cast_value codec handles the conversion: float64 -> uint8 on write,
# uint8 -> float64 on read.
codec = CastValueRustV1(
data_type="uint8",
rounding="nearest-even",
out_of_range="clamp",
scalar_map={
"encode": [(np.nan, 0), (np.inf, 1), (-np.inf, 2)],
"decode": [(0, np.nan), (1, np.inf), (2, -np.inf)],
},
)
# Create array and write float64 data — values are rounded and clamped to [0, 255]
data = np.array([np.nan, np.inf, -np.inf, 3.3, 4])
arr = zarr.create_array(data=data, store=zarr.storage.MemoryStore(), filters=codec)
# Read it back — comes back as float64, but with uint8 precision
result = arr[:]
print(f"Array dtype: {arr.dtype}")
print(f"Values written: {data}")
print(f"Values read: {result}")
"""
Array dtype: float64
Values written: [ nan inf -inf 3.3 4. ]
Values read: [ nan inf -inf 3. 4.]
"""Davis Bennett (@d-v-b)