feat: Implement coerce_types for Timestamp(Second), Interval(MonthDayNano)#9698
Open
CuteChuanChuan wants to merge 1 commit intoapache:mainfrom
Open
feat: Implement coerce_types for Timestamp(Second), Interval(MonthDayNano)#9698CuteChuanChuan wants to merge 1 commit intoapache:mainfrom
CuteChuanChuan wants to merge 1 commit intoapache:mainfrom
Conversation
- Timestamp(Second) with coerce_types=true: store as INT64 with LogicalType::Timestamp(MILLIS), values multiplied by 1000 - Interval(MonthDayNano) with coerce_types=true: store as 12-byte Parquet INTERVAL, nanoseconds truncated to milliseconds - Interval(MonthDayNano) with coerce_types=false: store as 16-byte raw FIXED_LEN_BYTE_ARRAY, preserving full nanosecond precision - Add reader support for MonthDayNano from both 12-byte and 16-byte representations - Add apply_hint rule for FixedSizeBinary(16) -> Interval(MonthDayNano) - Remove NYI error for writing IntervalMonthDayNanoArray - Add schema, round-trip, and edge-case tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Some Arrow types (
Timestamp(Second),Interval(MonthDayNano)) have no direct corresponding Parquet logical type. PR #6840 introduced thecoerce_typesflag and implemented Date64 coercion, but Timestamp and Interval coercion remained unimplemented.What changes are included in this PR?
Timestamp(Second):
coerce_types=true: stored as INT64 with LogicalType::Timestamp(MILLIS),values multiplied by 1000
coerce_types=false: stored as raw INT64 without logical type (unchanged)Interval(MonthDayNano):
coerce_types=true: stored as 12-byte Parquet INTERVAL, nanosecondstruncated to milliseconds (lossy)
coerce_types=false: stored as 16-byte raw FIXED_LEN_BYTE_ARRAY,preserving full nanosecond precision (lossless)
Other:
apply_hintrule forFixedSizeBinary(16)→Interval(MonthDayNano)IntervalMonthDayNanoArrayAre these changes tested?
Yes. 10 new tests added:
coerce_typescoerce_typesThe existing
interval_month_day_nano_single_columntest was updated from#[should_panic]to a passing test.Are there any user-facing changes?
IntervalMonthDayNanoArrayis now supported by the Parquet writer (previously returned a NYI error)WriterProperties::set_coerce_types(true)now also affectsTimestamp(Second)andInterval(MonthDayNano)columns