Skip to content

feat: Implement coerce_types for Timestamp(Second), Interval(MonthDayNano)#9698

Open
CuteChuanChuan wants to merge 1 commit intoapache:mainfrom
CuteChuanChuan:raymond/issue-1938-coerce-types-flag
Open

feat: Implement coerce_types for Timestamp(Second), Interval(MonthDayNano)#9698
CuteChuanChuan wants to merge 1 commit intoapache:mainfrom
CuteChuanChuan:raymond/issue-1938-coerce-types-flag

Conversation

@CuteChuanChuan
Copy link
Copy Markdown

@CuteChuanChuan CuteChuanChuan commented Apr 13, 2026

Which issue does this PR close?

Rationale for this change

Some Arrow types (Timestamp(Second), Interval(MonthDayNano)) have no direct corresponding Parquet logical type. PR #6840 introduced the coerce_types flag and implemented Date64 coercion, but Timestamp and Interval coercion remained unimplemented.

What changes are included in this PR?

Timestamp(Second):

  • coerce_types=true: stored as INT64 with LogicalType::Timestamp(MILLIS),
    values multiplied by 1000
  • coerce_types=false: stored as raw INT64 without logical type (unchanged)

Interval(MonthDayNano):

  • coerce_types=true: stored as 12-byte Parquet INTERVAL, nanoseconds
    truncated to milliseconds (lossy)
  • coerce_types=false: stored as 16-byte raw FIXED_LEN_BYTE_ARRAY,
    preserving full nanosecond precision (lossless)

Other:

  • Add reader support for MonthDayNano from both 12-byte and 16-byte formats
  • Add apply_hint rule for FixedSizeBinary(16)Interval(MonthDayNano)
  • Remove NYI error for writing IntervalMonthDayNanoArray

Are these changes tested?

Yes. 10 new tests added:

  • 5 schema-level tests verifying Parquet type output with/without coerce_types
  • 2 round-trip tests (write → read) for both Timestamp(Second) and MonthDayNano
  • 2 edge-case tests covering overflow boundaries, negative values, nulls, and nanosecond truncation behavior
  • 1 regression test confirming YearMonth/DayTime intervals are unaffected by coerce_types

The existing interval_month_day_nano_single_column test was updated from
#[should_panic] to a passing test.

Are there any user-facing changes?

  • IntervalMonthDayNanoArray is now supported by the Parquet writer (previously returned a NYI error)
  • WriterProperties::set_coerce_types(true) now also affects Timestamp(Second) and Interval(MonthDayNano) columns

- Timestamp(Second) with coerce_types=true: store as INT64 with LogicalType::Timestamp(MILLIS), values multiplied by 1000
- Interval(MonthDayNano) with coerce_types=true: store as 12-byte Parquet INTERVAL, nanoseconds truncated to milliseconds
- Interval(MonthDayNano) with coerce_types=false: store as 16-byte raw FIXED_LEN_BYTE_ARRAY, preserving full nanosecond precision
- Add reader support for MonthDayNano from both 12-byte and 16-byte representations
- Add apply_hint rule for FixedSizeBinary(16) -> Interval(MonthDayNano)
- Remove NYI error for writing IntervalMonthDayNanoArray
- Add schema, round-trip, and edge-case tests
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add coerce_types flag to parquet ArrowWriter

1 participant