Skip to content

chore: replace legacy datetime rebase tests with current scan coverage [iceberg]#3605

Draft
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:rebase-datetime
Draft

chore: replace legacy datetime rebase tests with current scan coverage [iceberg]#3605
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:rebase-datetime

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 26, 2026

Which issue does this PR close?

N/A

Rationale for this change

The original motivation for this PR was to:

  • Remove legacy ParquetDatetimeRebaseSuite, because all tests are ignored and are specific to the native_comet scan, which no longer exists.
  • Add tests confirming that COMET_EXCEPTION_ON_LEGACY_DATE_TIMESTAMP works correctly with native_iceberg_compat

However, I discovered that COMET_EXCEPTION_ON_LEGACY_DATE_TIMESTAMP has no effect because there is a dead code path. Therefore, I added a new test to confirm current behavior and updated the docs.

Although CometParquetFileFormat.getDatetimeRebaseSpec() is called and the config was read, the resulting useLegacyDateTimestamp flag is passed to NativeColumnReader, which overrides initNative() and does not call Utils.initColumnReader(). The flag was stored but never acted upon — the pre-1582 date check in the Rust column decoder (values.rs) was never reached.

Note that COMET_EXCEPTION_ON_LEGACY_DATE_TIMESTAMP may still be respected by the Iceberg integration.

What changes are included in this PR?

  • CometConf.scala: Removed COMET_EXCEPTION_ON_LEGACY_DATE_TIMESTAMP
  • CometParquetFileFormat.scala: Removed config propagation to Hadoop conf. Simplified getDatetimeRebaseSpec() to always set CORRECTED mode when LEGACY is detected (no conditional, no warning log).
  • parquet_scans.md: Removed reference to the deleted config. Simplified the rebasing limitation description.
  • ParquetReadSuite.scala: Added test "reading ancient dates before 1582" that reads a Parquet file written by Spark 3.2.0 with org.apache.spark.legacyDateTime metadata, verifies the plan uses Comet operators, and confirms all 8 rows with pre-1582 dates are read successfully.

How are these changes tested?

@andygrove andygrove marked this pull request as draft February 26, 2026 18:01
@andygrove andygrove changed the title chore: remove legacy ParquetDatetimeRebaseSuite and update docs chore: remove legacy ParquetDatetimeRebaseSuite and update docs [iceberg] Feb 26, 2026
@andygrove andygrove changed the title chore: remove legacy ParquetDatetimeRebaseSuite and update docs [iceberg] chore: replace legacy datetime rebase tests with current scan coverage [iceberg] Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant