Add unified and configurable null handling#1101
Conversation
309cff9 to
7fbc001
Compare
viiccwen
left a comment
There was a problem hiding this comment.
overall lg, but need to update docs, and left some comments.
| /// Reads Float64 data from a Parquet file. | ||
| /// | ||
| /// Expects a single Float64 column. For zero-copy access, use [`read_parquet_to_arrow`]. | ||
| pub fn read_parquet<P: AsRef<Path>>(path: P) -> Result<Vec<f64>> { |
There was a problem hiding this comment.
currently we specify read_* ( read_parquet(...) ) to use NullHandling::FillZero, should we adopt same strategy like *Readers (ParquetReader)?
like this?
pub fn read_parquet<P: AsRef<Path>>(path: P) -> Result<Vec<f64>>;
// to
pub fn read_parquet<P: AsRef<Path>>(
path: P,
null_handling: Option<NullHandling>,
) -> Result<Vec<f64>>;There was a problem hiding this comment.
Looks like we can use ParquetReader::new(path, None, NullHandling::Reject) when we want to configure that. We see read_parquet as a convenient function? (not sure about that, but we can left it to the future)
| // Initialize reader | ||
| let mut reader_core = crate::io::ParquetBlockReader::new(path, None)?; | ||
| let mut reader_core = | ||
| crate::io::ParquetBlockReader::new(path, None, crate::reader::NullHandling::FillZero)?; |
There was a problem hiding this comment.
It seems stream_encode() uses hardcoded NullHandling::FillZero for ParquetBlockReader. Batch and streaming reader paths are unified; this call site has no config. I think maybe we could have a follow-up here if stream_encode is ever given a config or options, expose null_handling there. otherwise document that streaming encode keeps FillZero for backward compatibility.
Related Issues
Closes #765
Changes
Why
How
NullHandlingenum (FillZero|Reject)0.0while streaming mode threw a runtime error — both paths now use the samehandle_float64_nulls()helperFillZerofor backward compatibility;Rejectreturns a clear error with guidancePipelineConfig, PyO3 bindings, and the PythonQuantumDataLoaderbuilder (.null_handling("fill_zero" | "reject"))Checklist