Skip to content

feat(python): Support partitioned tables in sink_iceberg#26864

Open
ldacey wants to merge 1 commit intopola-rs:mainfrom
ldacey:sink-iceberg-partitions
Open

feat(python): Support partitioned tables in sink_iceberg#26864
ldacey wants to merge 1 commit intopola-rs:mainfrom
ldacey:sink-iceberg-partitions

Conversation

@ldacey
Copy link

@ldacey ldacey commented Mar 9, 2026

  • add _resolve_partition_key to map pyicberg partition transforms to polars expressions for PartitionBy
image

This extends the work done here: #26799

I have been using sink_parquet with PartitionBy along with pyiceberg add_files in order to get lazy streaming writes to my Iceberg tables. Comparing my code with the committed code above I only had to make a few changes:

  • remove NotImplementedError for partitioned tables
  • mapping the Iceberg partition transforms to Polars expressions to pass to PartitionBy

Bucket transforms are not supported due to using murmur3 hashing.

I tested this with the simple pytests and a 6 million row table with various partitioning (Month(date), Day(date), Identity(location)) directly to my GCS catalog and to local iceberg tables.

  1. I used AI to generate pytests similar to the existing sink_iceberg tests and to compare feat: Add unstable LazyFrame.sink_iceberg #26799 versus my local code to understand the scope (to make sure I would not need to touch Rust code etc)
  2. I confirm that I have reviewed all changes myself, and I believe they are
    relevant and correct.

@nameexhaustion

- add _resolve_partition_key to map pyicberg partition transforms to
polars expressions for PartitionBy
@github-actions github-actions bot added A-io-iceberg Related to Apache Iceberg tables. enhancement New feature or an improvement of an existing feature python Related to Python Polars first-contribution First contribution by user labels Mar 9, 2026
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 91.17647% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.64%. Comparing base (9d60226) to head (4a04089).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/src/polars/io/iceberg/_sink.py 91.17% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #26864      +/-   ##
==========================================
+ Coverage   81.00%   81.64%   +0.64%     
==========================================
  Files        1805     1805              
  Lines      248021   248052      +31     
  Branches     3132     3140       +8     
==========================================
+ Hits       200902   202533    +1631     
+ Misses      46313    44713    -1600     
  Partials      806      806              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nameexhaustion nameexhaustion added do not merge This pull requests should not be merged right now and removed do not merge This pull requests should not be merged right now labels Mar 10, 2026
@ldacey
Copy link
Author

ldacey commented Mar 11, 2026

Is there an issue with the code or a better approach?

@nameexhaustion
Copy link
Collaborator

We're just not ready for this yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-io-iceberg Related to Apache Iceberg tables. do not merge This pull requests should not be merged right now enhancement New feature or an improvement of an existing feature first-contribution First contribution by user python Related to Python Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants