Skip to content

Latest commit

 

History

History
2077 lines (1761 loc) · 223 KB

File metadata and controls

2077 lines (1761 loc) · 223 KB

Changelog

rust-v0.30.0 (2025-12-31)

⚠️ There are a number of API changes between 0.30.x and 0.29.4, but we expect to see better performance as a result!

Full Changelog

Merged pull requests:

  • refactor: remove log_data call sites in find_files #4026 (roeap)
  • chore: remove wildcard dependency for publishing #4025 (rtyler)
  • refactor: use logical type ref when getting stats #4019 (roeap)
  • fix: handle stats config in data sink #4016 (roeap)
  • fix: null handling when extracting scalars #4014 (roeap)
  • fix: between range handling in expression translations #4013 (roeap)
  • chore: fix windows uri test #4011 (hntd187)
  • refactor: towards lazier snapshots #4010 (roeap)
  • fix: pin pyspark and clear disk space in runners #4007 (ion-elgreco)
  • test: add utilities for asserting DAT scan results #4005 (roeap)
  • chore: update delta-kernel to 0.19 #4004 (roeap)
  • refactor: simplify kernel extensions #4003 (roeap)
  • chore: clippy #4002 (roeap)
  • refactor: handle target version when resolving snapshot #4001 (roeap)
  • refactor: use rstest for running DAT tests #4000 (roeap)
  • feat: kernel expression conversion #3998 (roeap)
  • chore: add easier local coverage reporting #3995 (rtyler)
  • feat: expose operations on DeltaTable #3987 (roeap)
  • chore: remove some warnigs #3986 (roeap)
  • chore: normalize Url going into logstore and update everything to take references #3985 (rtyler)
  • fix: add missing field to snapshot serde #3984 (roeap)
  • feat: allow for concurrent deletes in conflict checker if data_change is false #3982 (abhiaagarwal)
  • fix: remove 3.9 from ci matrix #3978 (ion-elgreco)
  • fix: decode path before lookup #3976 (ion-elgreco)
  • chore: remove deprecated pyo3 methods #3975 (ion-elgreco)
  • chore: removing APIs and deprecation warnings: 0.30.x here we come #3962 (rtyler)
  • feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 #3949 (hntd187)
  • fix: schema evolution for merge operation #3945 (JustinRush80)
  • chore: remove Python 3.9 from our build infrastructure #3937 (rtyler)
  • docs: fix small typo issue #3935 (bmoreau8)
  • chore: removing references to using partition_filters for partition overwrite #3912 (zyd14)
  • feat(datafusion): add max_temp_directory_size parameter for z-order and compact operations for DataFusion #3847 (fvaleye)

Fixed bugs:

  • Asked to increase max_temp_directory_size in the disk manager configuration when optimizing large table #3833

Closed issues:

  • [Bug]: Count / get_add_actions exception for an empty table #4023
  • [Bug]: MERGE with schema evolution does not add new columns #4009
  • [Bug]: vacuum does not respect retention_hours when full=True #3989
  • [Bug]: write table by FFI call from go may memory leak? #3973
  • [Bug]: Table merging fails with merge_schema=True #3943
  • [Bug]: _internal.DeltaError: Generic DeltaTable error: Unable to map __delta_rs_path to action during overwrite with predicate #3939
  • [Feature]: update to DataFusion 51.0.0 #3920
  • [Bug]: get_add_actions() panics with "index out of bounds" when table has no data files #3918
  • [Bug]: Docs describe partition_filters parameter to write_deltalake that doesn't exist #3904
  • [Feature]: split delta-rs into multiple crates #3899
  • [Feature]: Drop python 3.9 support once EOL #3886
  • [Bug]: PyPi storage limit hit for deltalake [python releases blocked for time-being] #3876

rust-v0.29.4 (2025-11-15)

Full Changelog

Closed issues:

  • [Bug]: _delta_log not written with external S3 provider #3925
  • [Bug]: write_deltalake doesn't raise ValueError if the schema of the data passed differs from the existing table's schema #3917
  • [Bug]: Writing a number as a Decimal that exactly fits into its precision results in a decimal overflow #3909

Merged pull requests:

  • chore: adding more test coverage to the Gcp crate #3931 (rtyler)
  • chore: remove proofs/ which are no longer used #3930 (rtyler)
  • fix: correctly rectify Urls with dots in DeltaTableBuilder #3929 (rtyler)
  • chore(cargo): unify cargo profiles #3924 (fvaleye)
  • feat: add GCS auto-registration via ctor hooks #3923 (ethan-tyler)
  • fix: handle empty tables in get_add_actions() #3922 (vsmanish1772)
  • chore: bump the patch version to release fixes #3919 (rtyler)
  • fix: update stats serialization logic for scale-0 decimals #3916 (DrakeLin)

rust-v0.29.3 (2025-10-31)

Full Changelog

Implemented enhancements:

  • Port benchmarks from spark delta to delta-rs #3843

Closed issues:

  • [Bug]: AWS SSO no longer supported as of v.1.2.0 #3897
  • [Feature]: Ability to trace object store calls via snapshot load #3892

Merged pull requests:

  • chore: cleaning up warnings and preparing 0.29.3 #3910 (rtyler)
  • perf(snapshot): minor memory allocation and usage reduction without cloning #3903 (fvaleye)
  • feat(typed-builder): adopt typed-builder for safer builder pattern in non-core crates #3902 (fvaleye)
  • fix: use the default features of aws-config #3898 (rtyler)
  • feat: generate an Symlink Manifest for External Engines #3889 (JustinRush80)
  • chore: reduce wheel size #3878 (abhiaagarwal)

rust-v0.29.2 (2025-10-24)

Full Changelog

Implemented enhancements:

  • Make get_actions() sync #3837
  • Get the table row count based on the table history #3731

Closed issues:

  • [Bug]: SchemaMismatchError: Invalid data type for Delta Lake: Null with mode=append #3891
  • [Bug]: Panic: Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type #3884
  • [Bug]: Chunked dataframe creates small files #3871

Merged pull requests:

rust-v0.29.1 (2025-10-18)

Full Changelog

Implemented enhancements:

  • Redo the merge benchmarks #3839
  • Consolidate Datafusion session management #3799
  • Introduce tracing spans to all I/O sections #3641
  • feat(tracing): add tracing spans to all I/O sections #3795 (fvaleye)

Fixed bugs:

  • Python package has MIT and Apache license in metadata #3853
  • Severe native memory leak in delta-rs Python API when reading small result sets (local and remote tables, repeated queries) #3832
  • Memory error when writing an empty table #3827
  • Duplicated metadata action entry on table overwrite #3785

Merged pull requests:

  • chore: change the core and meta crate versions for release #3864 (rtyler)
  • chore: deprecate file_actions on state #3863 (roeap)
  • fix: resolve some warnings #3862 (roeap)
  • chore: remove some deprecated methods #3861 (roeap)
  • refactor: consolidate datafusion session setup #3860 (roeap)
  • perf: support pushing physical filters down through DeltaScan #3859 (alexwilcoxson-rel)
  • feat: allow RecordBatchWriter to pass through pass-through-commit-properties #3858 (rtyler)
  • chore: upgrade datafusion, arrow and parquet #3856 (dentiny)
  • fix: update pyproject.toml #3854 (wagenrace)
  • fix(core): handle Result type after get_actions sync conversion #3846 (yousefsaad12)
  • chore: bump version from 1.1.4 to 1.2.0 #3842 (ion-elgreco)
  • perf(path): only clone string for the path #3841 (fvaleye)
  • feat(bench): add new benchmarking script, harness, and profiling guide #3840 (abhiaagarwal)
  • refactor(bench): remove baseline while keeping the json_parsing benchmark #3838 (fvaleye)
  • feat: enable ability to do writes through Unity Catalog #3834 (hntd187)
  • feat: datafusion based kernel engine #3831 (roeap)
  • chore(performance): optimize JSON parsing in get_actions and snapshot reading #3830 (fvaleye)

rust-v0.29.0 (2025-10-10)

Full Changelog

Implemented enhancements:

  • use generic session trait tor tracking Datafusion sessions. #3825
  • Use kernel expressions for extracting data from Record batches. #3808
  • Allow passing SessionState into OptimizeBuilder #3797
  • consistently use EagerSnapshot in datafusion module. #3784
  • Update to delta_kernel 0.16.0 #3774
  • UnityCatalogBuilder cannot be initialized via StorageConfig. #3757
  • Get file stats from metadata, useful for creating AddActions #3735
  • feat(datafusion): add insert_into operation with DataFusion #3762 (fvaleye)

Fixed bugs:

  • Azure Data Lake Storage with "az://" path not working for Deltalake Python>1.1.0 #3824
  • Non UTC Timestamps, are not being saved as respective timezones #3823
  • post_commithook_properties checkpoint flag is not honored #3780
  • Running into delta-kernel bug on 28.0 #3770
  • larger table causes overflow deep in arrow land #3767
  • can_read_from is not called in TableProvider (or other read paths) #3765
  • deltalake mixing struct columns based on order #3750
  • Kernel error when vacuuming table #3745
  • write_deltalake with schema_mode=None will overwrite nullable properties of columns #3744
  • Export DecimalType from delta_kernel_rs. #3729
  • write_deltalake fail writes when using emulator #3716
  • Unable to write checkpoint on 0.27.0 due to tags going from nullable to not #3693
  • bug: table.cleanup_metadata does not correctly preserve files #3692

Closed issues:

  • Lower memory usage during merge when streamed_exec=False #3786
  • Provide a non-collecting iterator for history() #3753
  • Allow configuring SessionConfig in optimize #3751
  • WorkspaceOAuthProvider::fetch_token doesn't check HTTP status code. #3739

Merged pull requests:

  • fix(rust): protect recent uncommitted files in vacuum full mode #3835 (vsmanish1772)
  • fix: update deprecation versions to next release #3828 (roeap)
  • feat: expose arrow schema on snapshots #3822 (roeap)
  • chore: remove unreferenced file #3819 (roeap)
  • feat: shim kernel Scan and ScanBuilder #3818 (roeap)
  • chore: unify inconsistent SessionState in datafusion operations #3816 (abhi-airspace-intelligence)
  • refactor: move find_files into dedicated mod #3815 (roeap)
  • refactor: avoid downcasting to SessionState #3813 (roeap)
  • refactor: use EagerSnapshot in vacuum operation #3812 (roeap)
  • feat: access tombstones via TombstoneView #3809 (roeap)
  • ci: split out integration tests #3806 (roeap)
  • fix: maintaining load config from state #3805 (ion-elgreco)
  • refactor: consolidate extension planners #3804 (roeap)
  • refactor: remove table_url from Snapshot #3803 (roeap)
  • feat: allow passing a SessionState into a OptimizeBuilder #3802 (abhi-airspace-intelligence)
  • fix: avoid overflow for large table state #3801 (roeap)
  • refactor: use EagerSnapshot in datafusion module #3796 (roeap)
  • chore(ci): add automatic cache cleanup for closed main branch PRs #3793 (fvaleye)
  • fix: somehow the right test value didn't make it into the pr #3788 (rtyler)
  • fix: correct RecordBatchWriter interior schema mutation outside of evolution #3783 (rtyler)
  • chore: fix some typos in comment #3781 (juejinyuxitu)
  • chore: pin cargo-machete action to the sha right before a regression #3779 (rtyler)
  • chore: upgrade to delta-kernel-rs 0.16.0 and remove more dependencies #3773 (rtyler)
  • chore: upgrade the aws dependencies in deltalake-aws #3772 (rtyler)
  • fix: check if eligible to read #3771 (ion-elgreco)
  • feat(unity-catalog): support credentials via storage options #3769 (fvaleye)
  • fix: ensure that invalid URLs are bubbled up as errors when parsed #3766 (rtyler)
  • feat: change history() to return an Iterator #3764 (rtyler)
  • feat: allow OptimizeBuilder to accept SessionConfig for finer-grained control of execution #3763 (rtyler)
  • chore: update docs #3761 (ion-elgreco)
  • fix: better error handles in unity client #3752 (hntd187)
  • feat: update to DataFusion 50, pyo3 24, pyo3-arrow 0.11 #3749 (alamb)
  • fix: use a safe checkpoint when cleaning up metadata #3748 (corwinjoy)
  • fix: write_deltalake with mode="overwrite" mode and schema_mode=None does not overwrite schema metadata #3747 (FrankPortman)
  • fix: re-export the DecimalType for consumers #3738 (rtyler)
  • feat: add per column Parquet Encoding support for Delta Table column #3737 (niltecedu)
  • refactor: avoid explicit mutex in MergeBarrier #3734 (roeap)
  • feat: get the delta table row count based on the table history #3732 (ohadmata)
  • chore(ci): cache rust dependencies in the CI #3728 (fvaleye)
  • refactor: move table provider to dedicated mod #3726 (roeap)
  • feat: add deletion_vector_descriptor method #3721 (zeevm)
  • chore!: remove deprecated methods #3715 (roeap)
  • feat(url): use Url in Rust for accessing to DeltaTable, use only string-based api in Python #3707 (fvaleye)

rust-v0.28.1 (2025-08-30)

Full Changelog

Merged pull requests:

  • chore: bump to a minor version for a small core release with the new kernel #3718 (rtyler)
  • feat(storage): expand user with tilde in local path #3717 (fvaleye)
  • chore: update kernel to 0.15.1 #3714 (roeap)
  • chore(cargo): add cargo-machete to detect and remove unused dependencies #3713 (fvaleye)
  • chore: follow up changes on rust-v0.28.0 #3712 (rtyler)
  • feat: domain metadata read support #3678 (roeap)

rust-v0.28.0 (2025-08-27)

Full Changelog

⚠️ There is a known performance regression when opening very wide tables (50+ columns) that have hundreds of thousands of transactions. The fix is pending a new delta-kernel-rs release.

Implemented enhancements:

  • Python: Automatically convert Pandas null types to valid Delta Lake types in write_deltalake() #3691
  • Update HDFS object store to 0.15 #3680
  • create a v2 uuid checkpoint regression test #3666
  • Feature: update python table vacuum to add keep_versions parameter #3634
  • TypeError in DeltaTable.to_pyarrow_dataset when using non-string partition filter values (e.g., int) #3597
  • Make "cloud" feature optional #3589
  • convert_to_deltalake cannot convert parquet dataset if it has millisecond-precision timestamps #3535
  • Musl wheels #3399

Fixed bugs:

  • Automatically register the AWS, Azure, GCS, HDFS, LakeFS, and Unity storage handlers when the corresponding feature is enabled so DeltaOps::try_from_uri no longer errors with unknown schemes such as gs://.
  • Significant performance regression when opening S3 table on next branch #3667
  • Concurrent overwrite doesn't fail conflict checking #3622
  • source distributions missing in v1.1.1 #3621
  • Missing linux distro for v1.1.1 #3620
  • azurite tets failing in main #3612
  • Generic S3 Error on _last_checkpoint on ARM64 AWS Lambda with write_deltalake #3602
  • write_deltalake merge with list and large_list #3595
  • Python DeltaTable does not support writes in multiple threads (again?) #3594
  • Checkpoint creation fails on Azure in >=1.0.0 with "Azure does not support suffix range requests" #3593
  • Partition value strings containing reserved ASCII and non-ASCII are double-encoded. #3577
  • Deltalake version 1.0.2 errors with Azure Storage after appending many times #3567
  • Python deltalake 1.0.2 is not compatible with polars.Array #3566
  • Checkpoint schema breaking change between 0.25.5 and 1.0.2 #3527

Closed issues:

  • Array/list not encoded with partition filters #3648

Merged pull requests:

  • fix: reintroduce the 100 commit checkpoint interval #3708 (rtyler)
  • fix: enabling correctly pulling partition values out of column mapped tables #3706 (rtyler)
  • fix(format): fix formatting in Python for conversion file #3705 (fvaleye)
  • chore: remove unused dependencies #3698 (rtyler)
  • fix(pandas): implement-automatic-conversion-for-pandas-null-types #3695 (fvaleye)
  • chore: update hdfs object store to 0.15 #3681 (Kimahriman)
  • feat!: use kernel predicates on file streams #3669 (roeap)
  • chore: bump python #3664 (ion-elgreco)
  • fix: use RFC3896 percent encoding with delta protocol correctness #3661 (ion-elgreco)
  • feat!: kernel log replay #3660 (roeap)
  • ci: run integration tests against next branches #3658 (roeap)
  • fix: handle checking partition filters in array/list when converting … #3657 (smeyerre)
  • fix: aws special paths encoding #3656 (roeap)
  • feat: support converting parquet with non-microsecond timestamps to d… #3654 (smeyerre)
  • chore: use pytest-xdist for speeding up python tests #3642 (rtyler)
  • chore: remove deprecated use of kernel's Table #3639 (rtyler)
  • feat: add keep_versions parameter to vacuum command for python #3635 (corwinjoy)
  • chore: bump version for release #3633 (rtyler)
  • fix: avoid parsing generationExpressions as JSON #3632 (rtyler)
  • feat: build musl wheels upon release #3631 (rtyler)
  • fix: make the docs link checking more useful/less faily #3630 (rtyler)
  • fix: coerce polars.Array into a suitable Arrow list type #3623 (rtyler)
  • fix: ensure openssl-sys doesn't creep into the dependency via the kernel default engine #3619 (rtyler)
  • fix: allow writing to DeltaTable objects across Python threads #3618 (rtyler)
  • docs: fix broken daft links (how daft) #3617 (rtyler)
  • fix: ensure new checkpoints can be written after old checkpoints #3616 (rtyler)
  • fix: switch the url schemes for Azure integration tests #3614 (rtyler)
  • fix: allow non-string primitive types for partition filters when converting to pyarrow dataset #3613 (smeyerre)
  • refactor: make match_partitions and new_metadata public #3605 (zeevm)
  • fix: fix typo to fix CI typo check #3604 (alamb)
  • chore: update to DataFusion 49.0.0 #3603 (alamb)
  • chore: minor API changes after integration testing #3598 (rtyler)
  • fix: scan time was always 0 for merge metrics #3596 (rtyler)
  • refactor: make "cloud" feature in object_store optional #3590 (zeevm)
  • chore: generate a more recentish updated changelog #3588 (rtyler)
  • fix: creating new DeltaTable with invalid table name path no longer creates empty directory #3504 (smeyerre)

rust-v0.27.0 (2025-07-12)

Full Changelog

Implemented enhancements:

  • Feature: Vacuum with version retention #3530
  • Any way to prune the delta_log or support shallow clones #3565
  • Upgrade Arrow version to 55.1.0 #3540
  • Add config option to suppress deltalake_core::writer::stats warnings about bytes columns #3519
  • Remove pyarrow dependency (make opt-in), replace with arro3 for core components #3455
  • Don't retry lakefs commit or merge on 412 response (precondition failed) #3429
  • Use object_store spawnService #3427
  • Alter table description #3401
  • Remove put if absent options injection #3310
  • v1.0 Release tracking issue #3250
  • feat: add a table description and name to the Delta Table from Python #3464 (fvaleye)

Fixed bugs:

  • Python building broken on main due to maturin issue #3559
  • TypeError: write_deltalake() got an unexpected keyword argument 'schema' (deltalake/polars) #3546
  • SchemaMismatchError on empty ArrayType field while contains_null=True #3544
  • Can't open a delta-table: Unsupported reader features required: DeletionVectors #3543
  • Attempting to write a transaction 3 but the underlying table has been updated to 3 #3534
  • DeltaOps not recognizing abfss scheme for Azure #3523
  • Query execution time difference between QueryBuilder and using DataFusion directly. #3517
  • bug: timezone not preserved & raise exc on merge operation #3507
  • allow_unsafe_rename option stopped working in version 1 #3493
  • predicate appears to ignore partition and stats in pruning #3491
  • max_rows_per_file ignored when writing with rust engine #3490
  • delta-rs includes pending versions written by spark #3422

Merged pull requests:

  • chore: bump minor version for rust crate #3586 (rtyler)
  • refactor!: use delta-kernel Protocol and Metadata actions #3581 (roeap)
  • feat: vacuum with version retention #3537 (corwinjoy)
  • chore: bump patch versions for another relaese #3585 (rtyler)
  • feat: write engineInfo with delta-rs version #3584 (zachschuermann)
  • chore: remove the deltalake-sql crate #3582 (rtyler)
  • chore: latest clippy #3571 (roeap)
  • feat: convert partition filters to kernel predicates #3570 (roeap)
  • refactor: move schema code to kernel module #3569 (roeap)
  • chore: remove redundant words in comment #3568 (shangchenglumetro)
  • docs: ensure create_checkpoint() is visible in the Python API docs #3564 (itamarst)
  • chore: upgrade to delta_kernel 0.12.x #3561 (rtyler)
  • chore: clean up licenses in python project which are causing build issues #3560 (rtyler)
  • chore: update arrow/parquet to 55.2.0 #3558 (alamb)
  • fix: use proper DeltaTableState for vacuum commits #3550 (jeromegn)
  • fix: version binary search #3549 (aditanase)
  • chore: update the minor version to reflect a behavior change #3542 (rtyler)
  • chore: pin aws crates #3532 (ion-elgreco)
  • chore: set java version to 21 for pyspark 4.0 #3524 (ion-elgreco)
  • fix: using state provided in args in merge op #3522 (gtrawinski)
  • refactor: remove unecessary uses of datafusion subcrates #3521 (alamb)
  • chore: update to DataFusion 48.0.0 / arrow to 55.2.0 #3520 (alamb)
  • feat: make TableConfig accessible #3518 (ion-elgreco)
  • fix: remove forced table update from python writer #3515 (ohanf)
  • refactor: compute stats schema with kernel types #3514 (roeap)
  • feat: add convenience extension for kernel engine types #3510 (roeap)
  • refactor: move LazyTableProvider into python crate #3509 (roeap)
  • fix: setting wrong schema in table provider for merge #3508 (ion-elgreco)
  • fix: constraint parsing, roundtripping #3503 (ion-elgreco)
  • refactor!: have DeltaTable::version return an Option #3500 (roeap)
  • chore!: remove get_earliest_version #3499 (roeap)
  • chore: prepare for the next python release #3498 (rtyler)
  • ci: improve coverage collection #3497 (roeap)
  • chore: update runner #3494 (ion-elgreco)
  • docs: update link to df #3489 (rluvaton)
  • refactor!: remove and deprecate some python methods #3488 (roeap)
  • fix: ensure projecting only columns that exist in new files afte sche… #3487 (alexwilcoxson-rel)
  • chore: exclude Invariants from the default writer v2 feature set #3486 (rtyler)
  • test: improve storage config testing #3485 (roeap)
  • refactor!: get transaction versions for specific applications #3484 (roeap)
  • docs: fix bullet list formatting in dagster docs #3483 (avriiil)
  • fix: set casting safe param to False #3481 (ion-elgreco)
  • chore: update kernel to 0.11 #3480 (roeap)
  • chore: update migration docs #3479 (ion-elgreco)
  • chore: remove unused stats_parsed field #3475 (roeap)
  • refactor: remove protocol error #3473 (roeap)
  • chore: more typos #3471 (roeap)
  • chore: remove unused time_utils #3470 (roeap)
  • chore: set correct markers #3469 (ion-elgreco)
  • fix: schema conversion, add conversion test cases #3468 (ion-elgreco)
  • feat: write checkpoints with kernel #3466 (roeap)
  • fix: correct spelling errors found by CI spell checker #3465 (fvaleye)
  • chore: update kernel #3462 (roeap)
  • fix: use more accurate log path parsing #3461 (roeap)
  • refactor: remove pyarrow dependency #3459 (ion-elgreco)
  • chore: mark more tests which require datafusion #3458 (rtyler)
  • ci: add spellchecker to pr tests #3457 (roeap)
  • refactor: use full paths in log processing #3456 (roeap)
  • chore: ensuring default builds work without datafusion #3453 (rtyler)
  • refactor: use LogStore in Snapshot / LogSegment APIs #3452 (roeap)
  • test: avoid circular dependency with core/test crates #3450 (roeap)
  • feat: expose kernel Engine on LogStore #3446 (roeap)
  • refactor: more specific factory parameter names #3445 (roeap)
  • docs: add 1.0.0 migration guide #3443 (ion-elgreco)
  • chore: minor table module refactors #3442 (rtyler)
  • chore: remove unused code and deps #3441 (roeap)
  • chore: experiment with using sccache in GitHub Actions #3437 (rtyler)
  • feat: optimize datafusion predicate pushdown and partition pruning #3436 (rtyler)
  • chore: prepare py-1.0 release #3435 (ion-elgreco)
  • chore: make codecov more vigorously enforced to help ensure quality #3434 (rtyler)
  • chore: rely on the testing during coverage generation to speed up tests #3431 (rtyler)
  • chore: bump crate versions which are due for release #3430 (rtyler)
  • chore(deps): bump foyer to v0.17.2 to prevent from wrong result #3428 (MrCroxx)
  • feat: spawn io with spawn service #3426 (ion-elgreco)
  • fix: ignore temp log entries #3423 (corwinjoy)
  • fix: build Unity Catalog crate without DataFusion #3420 (linhr)
  • feat: added a check for gc code to run #3419 (JustinRush80)
  • chore: include license file in deltalake-derive crate #3417 (ankane)
  • fix: drop column update #3416 (ion-elgreco)
  • chore: missed a version bump for core #3415 (rtyler)
  • chore: bringing dat integration testing in ahead of kernel replay #3411 (rtyler)
  • chore: reduce scope of feature flags and compilation requirements for subcrates #3409 (rtyler)
  • chore: commit the contents of the 0.26.0 release #3408 (rtyler)
  • chore: bump versions of rust crates for another release party #3406 (rtyler)
  • fix: the default target size should be 100MB #3404 (HiromuHota)
  • chore: update delta_kernel to 0.10.0 #3403 (zachschuermann)
  • refactor: make "cloud" feature in object_store optional #3398 (zeevm)
  • chore: put a couple symbols behind the right feature gate #3393 (rtyler)
  • fix: clippy warnings #3390 (alamb)
  • feat: derive macro for config implementations #3389 (roeap)
  • feat!: update storage configuration system #3383 (roeap)
  • refactor!: move storage module into logstore #3382 (roeap)
  • chore: move proofs into dedicated folder #3381 (roeap)
  • refactor: move transaction module to kernel #3380 (roeap)
  • chore: clippy #3379 (roeap)
  • feat: upgrade to DataFusion 47.0.0 #3378 (alamb)
  • fix: if field contains space in constraint expression, checks will fail #3374 (Nordalf)
  • fix: parse unconventional logs #3373 (roeap)
  • feat: introduce VacuumMode::Full for cleaning up orphaned files #3368 (rtyler)
  • chore: fix some minor build warnings #3366 (rtyler)
  • chore: remove cdf feature #3365 (ion-elgreco)
  • docs: add example how to authenticate using Azure CLI for Azure ADSL integration #3357 (DanielBertocci)
  • fix: parse snapshot #3355 (ion-elgreco)
  • docs: update merge-tables.md with "Optimizing Merge Performance" section #3351 (ldacey)
  • fix: use field physical name when resolving partition columns #3349 (zeevm)
  • feat: during LakeFS file operations, skip merge when 0 changes #3346 (smeyerre)
  • refactor(python): improve typing, linting #3344 (ion-elgreco)
  • docs: update dataFusion integration example #3343 (riziles)
  • perf: use lazy sync reader #3338 (ion-elgreco)
  • feat(api): add rustls and native-tls features #3335 (zeevm)
  • refactor: add 'cloud' feature to 'core' to enable 'cloud' on 'object_store' only when needed #3332 (zeevm)
  • chore: improve io error msg #3328 (ion-elgreco)
  • chore: remove pyarrow upper #3325 (ion-elgreco)
  • fix: block_in_place to allow nested tasks #3324 (ion-elgreco)
  • fix: check for all known valid delta files in is_deltatable #3318 (umartin)
  • fix: added restored metadata as action to the next committed version #3303 (Nordalf)
  • fix: correct Python docs for incremental compaction on OPTIMIZE #3301 (roykim98)

rust-v0.26.2 (2025-05-15)

Full Changelog

Fixed bugs:

  • Unable to use deltalake with MinIO #3418
  • Column __delta_rs_update_predicate when update a table #3414
  • Onelake incompatible with rustls #3243

rust-v0.26.1 (2025-05-05)

Full Changelog

rust-v0.26.0 (2025-05-03)

Full Changelog

Implemented enhancements:

  • Make "cloud" feature optional #3397
  • Delta Column mapping feature #3358
  • Allow choosing kernel built-in engine #3334
  • Don't enable "cloud" feature on object_store by default #3331
  • Cannot use latest delta-rs with pyarrow 19.0.1 #3323
  • Support for Parquet Bloom Filters at RowGroup Level #3322

Fixed bugs:

  • Timezone information in schema dropped and converted to UTC when writing PyArrow table #3402
  • Got "Generic DeltaTable error: type_coercion" when updating the deltatable #3400
  • Removing pinned chrono version #3391
  • CDF changes fully loaded into memory #3388
  • Unable to use write_deltalake on nullable columns with mode = "overwrite" #3387
  • uv add deltalake fails without pinning to 0.25.4 #3364
  • Unable to use Arrow UUID type with DeltaLake and schema().to_pyarrow(), Python Exception raised. #3363
  • Writer Incompatibility Issue Between Delta Lake Protocol Version and Rust Writer #3356
  • Restore not readable in Synapse/Databricks #3354
  • Restore is not rolling back schema changes #3352
  • Schema Mode merge with subfields 'struct fields don't match' #3350
  • Error resolving renamed partition columns #3348
  • DeltaError on attempting merge operation involving map field #3340
  • MaxCommitAttempts error due to stale snapshot used during OPTIMIZE #3337
  • _internal.DeltaError: Failed to parse parquet: Parquet error: Z-order failed while scanning data #3327
  • Error writing delta table with Null columns #3316
  • Error in writing panda data frame with pyarrow engine to DeltaTable #3315
  • DeltaTable.is_deltatable sometimes return false for valid deltatable #3314
  • Runtime panic in new streaming writer: Cannot start a runtime from within a runtime #3271
  • Build failure due to conflicting strum versions in deltalake-core dependencies #3267

Closed issues:

  • How to read delta table in azure databricks unity catalog #3276
  • Feature request: provide capability for pass in timestamp value when dump delta log #3258

rust-v0.25.0 (2025-03-09)

Full Changelog

Implemented enhancements:

  • Configurable column encoding for parquet checkpoint files to address Fabric limitation #3212

Fixed bugs:

  • Writing from Windows host to Ubuntu WSL virtual machine doesn't succeed #3307
  • DeltaTable(path) errors out for unknown data types #3305
  • Commit Properties / Custom Metadata not loaded #3304
  • peek_next_commit() panics on invalid json #3297
  • cargo build fail on macOS 15.3.1 #3295
  • Error when writing polars dataframe with enum or categorical columns #3284
  • Schema evolution causing table ID to be regenerated, breaks Spark streaming jobs #3274
  • is_deltatable creates the path if it doesn't exist #3259
  • is_deltatable throwing S3 error in 0.25.1 on linux aarch64 build #3241
  • Datafusion error: External error: Failed to get a credential from UnityCatalog client configuration. #3236
  • deltalake 0.24.0 cannot compile with E0308 error #3235
  • 0.25.0 can't be pip installed on linux #3234
  • Azure blob - trying to fetch _delta_log for large table exception: Failed to parse parquet: External: Generic MicrosoftAzure error: error decoding response body #3232
  • Trying to open a DeltaTable at a non-existent path creates the path #3228
  • don't write deletion vector entry in the log #3211

Closed issues:

  • Memory leak on 0.25.x #3306

Merged pull requests:

rust-v0.20.1 (2024-09-27)

Full Changelog

Implemented enhancements:

  • Allow to specify Azurite hostname and service port as backend #2900
  • docs section usage/Managing a table is out of date w.r.t. optimizing tables #2891
  • generate more sensible row group size #2545

Fixed bugs:

  • Cannot write to Minio with deltalake.write_deltalake or Polars #2894
  • Schema Mismatch Error When appending Parquet Files with Metadata using Rust Engine #2888
  • Assume role support has been broken since 2022 🤣 #2879
  • z-order fails on table that is partitioned by value with space #2834
  • "builder error for url" when creating an instance of a DeltaTable which is located in an azurite blob storage #2815

Closed issues:

  • delta-rs can't write to a table if datafusion is not enabled #2910

rust-v0.20.0 (2024-09-18)

Full Changelog

Fixed bugs:

  • DeltaTableBuilder flags ignored #2808
  • Require files in config is not anymore used to skip reading add actions #2796

Merged pull requests:

rust-v0.19.1 (2024-09-11)

Full Changelog

Implemented enhancements:

  • question: deletionVectors support #2829
  • [Minor] Make Add::get_json_stats public #2821
  • expose target_file_size in python side for WriterProperties #2810
  • expose default_column_properties, column_properties of parquet WriterProperties in python #2785
  • CDC support in deltalog when writing delta table #2720
  • Function behaving similarly to SHOW PARTITIONS in the Python API #2671
  • Expose set_statistics_truncate_length via Python WriterProperties #2630

Fixed bugs:

  • write_deltalake with predicate throw index out of bounds #2867
  • writing to blobfuse has stopped working in 0.19.2 #2860
  • cannot read from public GCS bucket if non logged in #2859
  • Stats missing for dataSkippingStatsColumns when escaping column name #2849
  • 0.19.2 install error when using poetry, pdm on Ubuntu #2848
  • deltalake-* crates use different version than specified in Cargo.toml, leading to unexpected behavior #2847
  • Databricks fails integrity check after compacting with delta-rs #2839
  • "failed to load region from IMDS" back in 0.19 despite AWS_EC2_METADATA_DISABLED=true #2819
  • min/max_row_groups not respected #2814
  • Large Memory Spike on Merge #2802
  • Deleting large number of records fails with no error message #2798
  • max_spill_size incorrect default value #2794
  • Delta-RS Saved Delta Table not properly ingested into Databricks #2779
  • Missing Linux binary releases and source tarball for Python release v0.19.0 #2777
  • Transaction log parsing performance regression #2760
  • RecordBatchWriter only creates stats for the first 32 columns; this prevents calling create_checkpoint. #2745
  • DeltaScanBuilder does not respect datafusion context's datafusion.execution.parquet.pushdown_filters #2739
  • IN (...) clauses appear to be ignored in merge commands with S3 - extra partitions scanned #2726
  • Trailing slash on AWS_ENDPOINT raises S3 Error #2656
  • AsyncChunkReader::get_bytes error: Generic MicrosoftAzure error: error decoding response body #2592

rust-v0.19.0 (2024-08-14)

Full Changelog

Implemented enhancements:

  • Only allow squash merge #2542

Fixed bugs:

  • Write also insert change types in writer CDC #2750
  • Regression in Python multiprocessing support #2744
  • SchemaError occurs during table optimisation after upgrade to v0.18.1 #2731
  • AWS WebIdentityToken exposure in log files #2719
  • Write performance degrades with multiple writers #2683
  • Write monotonic sequence, but read is non monotonic #2659
  • Python write_deltalake with schema_mode="merge" casts types #2642
  • Newest docs (potentially) not released #2587
  • CDC is not generated for Structs and Lists #2568

Closed issues:

Merged pull requests:

rust-v0.18.2 (2024-08-07)

Full Changelog

Implemented enhancements:

  • Choose which columns to store min/max values for #2709
  • Projection pushdown for load_cdf #2681
  • Way to check if Delta table exists at specified path #2662
  • Support HDFS via hdfs-native package #2611
  • Deletion _change_type does not appear in change data feed #2579

Fixed bugs:

  • Slow add_actions.to_pydict for tables with large number of columns, impacting read performance #2733
  • append is deleting records #2716
  • segmentation fault - Python 3.10 on Mac M3 #2706
  • Failure to delete dir and files #2703
  • DeltaTable.from_data_catalog not working #2699
  • Project should use the same version of ruff in the lint stage of python_build.yml as in pyproject.toml #2678
  • un-tracked columns are giving json error when pyarrow schema have field with nullable=False and create_checkpoint is triggered #2675
  • [BUG]write_delta({'custom_metadata':str}) cannot be converted. str to pyDict error (0.18.2_DeltaPython/Windows10) #2697
  • Pyarrow engine not supporting schema overwrite with Append mode #2654
  • deltalake-core version re-exported by deltalake different than versions used by deltalake-azure and deltalake-gcp #2647
  • i32 limit in JSON stats #2646
  • Rust writer not encoding correct URL for partitions in delta table #2634
  • Large Types breaks merge predicate pruning #2632
  • Getting error when converting a partitioned parquet table to delta table #2626
  • Arrow: Parquet does not support writing empty structs when creating checkpoint #2622
  • InvalidTableLocation("Unknown scheme: gs") on 0.18.0 #2610
  • Unable to read delta table created using Uniform #2578
  • schema merging doesn't work when overwriting with a predicate #2567

Closed issues:

  • Unable to write new partitions with type timestamp on tables created with delta-rs 0.10.0 #2631

Merged pull requests:

rust-v0.19.0 (2024-08-14)

Full Changelog

Implemented enhancements:

  • Only allow squash merge #2542

Fixed bugs:

  • Write also insert change types in writer CDC #2750
  • Regression in Python multiprocessing support #2744
  • SchemaError occurs during table optimisation after upgrade to v0.18.1 #2731
  • AWS WebIdentityToken exposure in log files #2719
  • Write performance degrades with multiple writers #2683
  • Write monotonic sequence, but read is non monotonic #2659
  • Python write_deltalake with schema_mode="merge" casts types #2642
  • Newest docs (potentially) not released #2587
  • CDC is not generated for Structs and Lists #2568

Closed issues:

Merged pull requests:

rust-v0.18.2 (2024-08-07)

Full Changelog

Implemented enhancements:

  • Choose which columns to store min/max values for #2709
  • Projection pushdown for load_cdf #2681
  • Way to check if Delta table exists at specified path #2662
  • Support HDFS via hdfs-native package #2611
  • Deletion _change_type does not appear in change data feed #2579
  • Could you please explain in the README what "Deltalake" is for the uninitiated? #2523
  • Discuss: Allow protocol change during write actions #2444
  • Support for Arrow PyCapsule interface #2376

Fixed bugs:

  • Slow add_actions.to_pydict for tables with large number of columns, impacting read performance #2733
  • append is deleting records #2716
  • segmentation fault - Python 3.10 on Mac M3 #2706
  • Failure to delete dir and files #2703
  • DeltaTable.from_data_catalog not working #2699
  • Project should use the same version of ruff in the lint stage of python_build.yml as in pyproject.toml #2678
  • un-tracked columns are giving json error when pyarrow schema have field with nullable=False and create_checkpoint is triggered #2675
  • [BUG]write_delta({'custom_metadata':str}) cannot be converted. str to pyDict error (0.18.2_DeltaPython/Windows10) #2697
  • Pyarrow engine not supporting schema overwrite with Append mode #2654
  • deltalake-core version re-exported by deltalake different than versions used by deltalake-azure and deltalake-gcp #2647
  • i32 limit in JSON stats #2646
  • Rust writer not encoding correct URL for partitions in delta table #2634
  • Large Types breaks merge predicate pruning #2632
  • Getting error when converting a partitioned parquet table to delta table #2626
  • Arrow: Parquet does not support writing empty structs when creating checkpoint #2622
  • InvalidTableLocation("Unknown scheme: gs") on 0.18.0 #2610
  • Unable to read delta table created using Uniform #2578
  • schema merging doesn't work when overwriting with a predicate #2567
  • Not working in AWS Lambda (0.16.2 - 0.17.4) OSError: Generic S3 error #2511
  • DataFusion filter on partition column doesn't work. (when the physical schema ordering is different to logical one) #2494
  • Creating checkpoints for tables with missing column stats results in Err #2493
  • Cannot merge to a table with a timestamp column after upgrading delta-rs #2478
  • Azure AD Auth fails on ARM64 #2475
  • Generic S3 error: Error after 0 retries ... Broken pipe (os error 32) #2403
  • write_deltalake identifies large_string as datatype even though string is set in schema #2374
  • Inconsistent arrow timestamp type breaks datafusion query #2341

Closed issues:

  • Unable to write new partitions with type timestamp on tables created with delta-rs 0.10.0 #2631

Merged pull requests:

rust-v0.18.0 (2024-06-12)

Full Changelog

Implemented enhancements:

  • documentation: concurrent writes for non-S3 backends #2556
  • pyarrow options for write_delta #2515
  • [deltalake_aws] Allow configuring separate endpoints for S3 and DynamoDB clients. #2498
  • Include file stats when converting a parquet directory to a Delta table #2490
  • Adopt the delta kernel types #2489

Fixed bugs:

  • raise_if_not_exists for properties not configurable on CreateBuilder #2564
  • write_deltalake with rust engine fails when mode is append and overwrite schema is enabled #2553
  • Running the basic_operations examples fails with Error: Transaction { source: WriterFeaturesRequired(TimestampWithoutTimezone) } #2552
  • invalid peer certificate: BadSignature when connecting to s3 from arm64/aarch64 #2551
  • load_cdf() issue : Generic S3 error: request or response body error: operation timed out #2549
  • write_deltalake fails on Databricks volume #2540
  • Getting "Microsoft Azure Error: Operation timed out" when trying to retrieve big files #2537
  • Impossible to append to a DeltaTable with float data type on RHEL #2520
  • Creating DeltaTable object slow #2518
  • write_deltalake throws parser error when using rust engine and big decimals #2510
  • TypeError: Object of type int64 is not JSON serializable when writing using a Pandas dataframe #2501
  • unable to read delta table when table contains both null and non-null add stats #2477
  • Commits on WriteMode::MergeSchema cause table metadata corruption #2468
  • S3 object store always returns IMDS warnings #2460
  • File skipping according to documentation #2427
  • LockClientError #2379
  • get_app_transaction_version() returns wrong result #2340
  • Property setting in create is not handled correctly #2247
  • Handling of decimals in scientific notation #2221
  • Unable to append to delta table without datafusion feature #2204
  • Decimal Column with Value 0 Causes Failure in Python Binding #2193

Merged pull requests:

rust-v0.17.3 (2024-05-01)

Full Changelog

Implemented enhancements:

  • Limit concurrent ObjectStore access to avoid resource limitations in constrained environments #2457
  • How to get a DataFrame in Rust? #2404
  • Allow checkpoint creation when partition column is "timestampNtz " #2381
  • is there a way to make writing timestamp_ntz optional #2339
  • Update arrow dependency #2328
  • Release GIL in deltalake.write_deltalake #2234
  • Unable to retrieve custom metadata from tables in rust #2153
  • Refactor commit interface to be a Builder #2131

Fixed bugs:

  • Handle rate limiting during write contention #2451
  • regression : delta.logRetentionDuration don't seems to be respected #2447
  • Issue writing to mounted storage in AKS using delta-rs library #2445
  • TableMerger - when_matched_delete() fails when Column names contain special characters #2438
  • Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
  • Merge on predicate throw error on date column: Unable to convert expression to string #2420
  • Writing Tables with Append mode errors if the schema metadata is different #2419
  • Logstore issues on AWS Lambda #2410
  • Datafusion timestamp type doesn't respect delta lake schema #2408
  • Compacting produces smaller row groups than expected #2386
  • ValueError: Partition value cannot be parsed from string. #2380
  • Very slow s3 connection after 0.16.1 #2377
  • Merge update+insert truncates a delta table if the table is big enough #2362
  • Do not add readerFeatures or writerFeatures keys under checkpoint files if minReaderVersion or minWriterVersion do not satisfy the requirements #2360
  • Create empty table failed on rust engine #2354
  • Getting error message when running in lambda: message: "Too many open files" #2353
  • Temporary files filling up _delta_log folder - increasing table load time #2351
  • compact fails with merged schemas #2347
  • Cannot merge into table partitioned by date type column on 0.16.3 #2344
  • Merge breaks using logical datatype decimal128 #2343
  • Decimal types are not checked against max precision/scale at table creation #2331
  • Merge update+insert truncates a delta table #2320
  • Extract add.stats_parsed with wrong type #2312
  • Process fails without error message when executing merge #2310
  • delta_rs don't seems to respect the row group size #2309
  • Auth error when running inside VS Code #2306
  • Unable to read deltatables with binary columns: Binary is not supported by JSON #2302
  • Schema evolution not coercing with Large arrow types #2298
  • Panic in deltalake_core::kernel::snapshot::log_segment::list_log_files_with_checkpoint::{{closure}} #2290
  • Checkpoint does not preserve reader and writer features for the table protocol. #2288
  • Z-Order with larger dataset resulting in memory error #2284
  • Successful writes return error when using concurrent writers #2279
  • Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275
  • Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262
  • DeltaTable is not resilient to corrupted checkpoint state #2258
  • Inconsistent units of time #2256
  • Partition column comparison is an assertion rather than if block with raise exception #2242
  • Unable to merge column names starting from numbers #2230
  • Merging to a table with multiple distinct partitions in parallel fails #2227
  • cleanup_metadata not respecting custom logRetentionDuration #2180
  • Merge predicate fails with a field with a space #2167
  • When_matched_update causes records to be lost with explicit predicate #2158
  • Merge execution time grows exponetially with the number of column #2107
  • _internal.DeltaError when merging #2084

rust-v0.17.1 (2024-03-06)

Full Changelog

Implemented enhancements:

  • Get statistics metadata #2233
  • add option to append only a subsets of columns #2212
  • add documentation how to configure delta.logRetentionDuration #2072
  • Add drop constraint #2070
  • Add 0.16 deprecation warnings for DynamoDB lock #2049

Fixed bugs:

  • cleanup_metadata not respecting custom logRetentionDuration #2180
  • Rust writer panics on empty record batches #2253
  • DeltaLake executed Rust: write method not found in DeltaOps #2244
  • DELTA_FILE_PATTERN regex is incorrectly matching tmp commit files #2201
  • Failed to create checkpoint with "Parquet does not support writing empty structs" #2189
  • Error when parsing delete expressions #2187
  • terminate called without an active exception #2184
  • Now conda-installable on M1 #2178
  • Add error message for partition_by check #2177
  • deltalake 0.15.2 prints partitions_values and paths which is not desired #2176
  • cleanup_metadata can potentially delete most recent checkpoint, corrupting table #2174
  • Broken filter for newly created delta table #2169
  • Hash for StructField should consider more than the name #2045
  • Schema comparison in writer #1853
  • fix(python): sort before schema comparison #2209 (ion-elgreco)
  • fix: prevent writing checkpoints with a version that does not exist in table state #1863 (rtyler)

Closed issues:

  • Bug/Question: arrow'sFixedSizeList is not roundtrippable #2162

Merged pull requests:

rust-v0.17.0 (2024-02-06)

⚠️ The release of 0.17.0 removes the legacy dynamodb lock functionality, AWS users must read these release notes! ⚠️

File handlers

The 0.17.0 release moves storage implementations into their own crates, such as deltalake-aws. A consequence of that refactoring is that custom storage and file scheme handlers must be registered/initialized at runtime. Storage subcrates conventionally define a register_handlers function which performs that task. Users may see errors such as:

thread 'main' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/deltalake-core-0.17.0/src/table/builder.rs:189:48:
The specified table_uri is not valid: InvalidTableLocation("Unknown scheme: s3")
  • Users of the meta-crate (deltalake) can call the storage crate via: deltalake::aws::register_handlers(None); at the entrypoint for their code.
  • Users who adopt core and storage crates independently (e.g. deltalake-aws) can register via deltalake_aws::register_handlers(None);.

The AWS, Azure, and GCP crates must all have their custom file schemes registered in this fashion.

dynamodblock to S3DynamoDbLogStore

The locking mechanism is fundamentally different between deltalake v0.16.x and v0.17.0, starting with this release the deltalake and deltalake-aws crates this library now relies on the same protocol for concurrent writes on AWS as the Delta Lake/Spark implementation.

Fundamentally the DynamoDB table structure changes, which is documented here. The configuration of a Rust process should continue to use the AWS_S3_LOCKING_PROVIDER environment value of dynamodb. The new table must be specified with the DELTA_DYNAMO_TABLE_NAME environment or configuration variable, and that should name the new S3DynamoDbLogStore compatible DynamoDB table.

Because locking is required to ensure safe cconsistent writes, there is no iterative migration, 0.16 and 0.17 writers cannot safely coexist. The following steps should be taken when upgrading:

  1. Stop all 0.16.x writers
  2. Ensure writes are completed, and lock table is empty.
  3. Deploy 0.17.0 writers

Full Changelog

Implemented enhancements:

  • Expose the ability to compile DataFusion with SIMD #2118
  • Updating Table log retention configuration with write_deltalake silently changes nothing #2108
  • ALTER table, ALTER Column, Add/Modify Comment, Add/remove/rename partitions, Set Tags, Set location, Set TBLProperties #2088
  • Docs: Update docs for check constraints #2063
  • Don't ensure_table_uri when creating a table with_log_store #2036
  • Exposing custom_metadata in merge operation #2031
  • Support custom table properties via TableAlterer and write/merge #2022
  • Remove parquet2 crate support #2004
  • Merge operation that only touches necessary partitions #1991
  • store userMetadata on write operations #1990
  • Create Dask integration page #1956
  • Merge: Filtering on partitions #1918
  • Rethink the load_version and load_with_datetime interfaces #1910
  • docs: Delta Lake + Arrow Integration #1908
  • docs: Delta Lake + Polars integration #1906
  • Rethink decision to expose the public interface in namespaces #1900
  • Add documentation on how to build and run documentation locally #1893
  • Add API to create an empty Delta Lake table #1892
  • Implementing CHECK constraints #1881
  • Check Invariants are respecting table features for write paths #1880
  • Organize docs with single lefthand sidebar #1873
  • Make sure invariants are handled properly throughout the codebase #1870
  • Unable to use deltalake Schema in write_deltalake #1862
  • Add a Rust-backed engine for write_deltalake #1861
  • Run doctest in CI for Python API examples #1783
  • [RFC] Use arrow for checkpoint reading and state handling #1776
  • Expose Python exceptions in public module #1771
  • Expose cleanup_metadata or create_checkpoint_from_table_uri_and_cleanup to the Python API #1768
  • Expose convert_to_delta to Python API #1767
  • Add high-level checking for append-only tables #1759

Fixed bugs:

  • Row order no longer preserved after merge operation #2165
  • Error when reading delta table with IDENTITY column #2152
  • Merge on IS NULL condition doesn't work for empty table #2148
  • JsonWriter converts structured parsing error into plain string #2143
  • Pandas import error when merging tables #2112
  • test_repair_on_update broken in main #2109
  • WriteBuilder::with_input_execution_plan does not apply the schema to the log's metadata fields #2105
  • MERGE logical plan vs execution plan schema mismatch #2104
  • Partitions not pushed down #2090
  • Cant create empty table with write_deltalake #2086
  • Unexpected high costs on Google Cloud Storage #2085
  • Unable to read s3 table: Unknown scheme: s3 #2065
  • write_deltalake not respecting writer_properties #2064
  • Unable to read/write tables with the "gs" schema in the table_uri in 0.15.1 #2060
  • LockClient required error for S3 backend in 0.15.1 python #2057
  • Error while writing Pandas DataFrame to Delta Lake (S3) #2051
  • Error with dynamo locking provider on 0.15 #2034
  • Conda version 0.15.0 is missing files #2021
  • Rust panicking through Python library when a delete predicate uses a nullable field #2019
  • No snapshot or version 0 found, perhaps /Users/watsy0007/resources/test_table/ is an empty dir? #2016
  • Generic DeltaTable error: type_coercion in Struct column in merge operation #1998
  • Constraint expr not formatted during commit action #1971
  • .load_with_datetime() is incorrectly rounding to nearest second #1967
  • vacuuming log files #1965
  • Unable to merge uppercase column names #1960
  • Schema error: Invalid data type for Delta Lake: Null #1946
  • Python v0.14 wheel files not up to date #1945
  • python Release 0.14 is missing Windows wheels #1942
  • CI integration test fails randomly: test_restore_by_datetime #1925
  • Merge data freezes indefenetely #1920
  • Load DeltaTable from non-existing folder causing empty folder creation #1916
  • Reoptimizes merge bins with only 1 file, even though they have no effect. #1901
  • The Python Docs link in README.MD points to old docs #1898
  • optimize.compact() fails with bad schema after updating to pyarrow 8.0 #1889
  • Python build is broken on main #1856
  • Checkpoint error with Azure Synapse #1847
  • merge very slow compared to delete + append on larger dataset #1846
  • get_add_actions fails with deltalake 0.13 #1835
  • Handle PyArrow CVE-2023-47248 #1834
  • Delta-rs writer hangs with to many file handles open (Azure) #1832
  • Encountering NotATable("No snapshot or version 0 found, perhaps xxx is an empty dir?") #1831
  • write_deltalake is not creating checkpoints #1815
  • Problem writing tables in directory named with char ~ #1806
  • DeltaTable Merge throws in merging if there are uppercase in Schema. #1797
  • rust merge error - datafusion panics #1790
  • expose use_dictionary=False when writing Delta Table and running optimize #1772

Closed issues:

  • Is this print necessary? Can we remove this. #2110
  • Azure concurrent writes #2069
  • Fix docs deployment #1867
  • Add a header in old docs and direct users to new docs #1865

rust-v0.16.5 (2023-11-15)

Full Changelog

Implemented enhancements:

  • When will upgrade object_store to 0.8? #1858
  • No Official Help #1849
  • Auto assign GitHub issues with a "take" message #1791

Fixed bugs:

  • cargo clippy fails on core in main #1843

rust-v0.16.4 (2023-11-12)

Full Changelog

Implemented enhancements:

  • Unable to add deltalake git dependency to cargo.toml #1821

rust-v0.16.3 (2023-11-08)

Full Changelog

Implemented enhancements:

  • Docs: add release GitHub action #1799
  • Use bulk deletes where possible #1761

Fixed bugs:

  • Code Owners no longer valid #1794
  • MERGE works incorrectly with partitioned table if the data column order is not same as table column order #1787
  • errors when using pyarrow dataset as a source #1779
  • Write to Microsoft OneLake failed. #1764

rust-v0.16.2 (2023-10-21)

Full Changelog

rust-v0.16.1 (2023-10-21)

Full Changelog

rust-v0.16.0 (2023-09-27)

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when performing append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)

rust-v0.15.0 (2023-09-06)

Full Changelog

Implemented enhancements:

  • Configurable number of retries for transaction commit loop #1595

Fixed bugs:

  • Unable to read table using VM Managed Identity on Azure #1462
  • Unable to query by partition column #1445

Merged pull requests:

rust-v0.14.0 (2023-08-01)

Full Changelog

Implemented enhancements:

  • Define common dependencies in Cargo Workspace #1572
  • Make delta_datafusion::find_files public #1559

Fixed bugs:

  • Excessive integration test sizes causing builds to fail #1550
  • Slack invite link is not working #1530

Merged pull requests:

rust-v0.13.1 (2023-07-18)

Fixed bugs:

  • Revert premature merge of an attempted fix for binary column statistics #1544

rust-v0.13.0 (2023-07-15)

Full Changelog

Implemented enhancements:

  • Add nested struct supports #1518
  • Support FixedLenByteArray UUID statistics as a logical scalar #1483
  • Exposing create_add in the API #1458
  • Update features table on README #1404
  • docs(python): show data catalog options in Python API reference #1347
  • Add optimization to only list log files starting at a certain name #1252
  • Support configuring parquet compression #1235
  • parallel processing in Optimize command #1171

Fixed bugs:

  • get_add_actions() MAX is not showing complete value #1534
  • Can't get stats's minValues in add actions #1515
  • Pyarrow is_null filter not working as expected after loading using deltalake #1496
  • Can't write to table that uses generated columns #1495
  • Json error: Binary is not supported by JSON when writing checkpoint files #1493
  • _last_checkpoint size field is incorrect #1468
  • Error when Z Ordering a larger dataset #1459
  • Timestamp parsing issue #1455
  • File options are ignored when writing delta #1444
  • Slack Invite Link No Longer Valid #1425
  • cleanup_metadata doesn't remove .checkpoint.parquet files #1420
  • The test of reading the data from the blob storage located in Azurite container failed #1415
  • The test of reading the data from the bucket located in Minio container failed #1408
  • Datafusion: unreachable code reached when parsing statistics with missing columns #1374
  • vacuum is very slow on Cloudflare R2 #1366

Closed issues:

  • Expose Compression Options or WriterProperties for writing to Delta #1469
  • Support out-of-core Z-order using DataFusion #1460
  • Expose Z-order in Python #1442

Merged pull requests:

rust-v0.12.0 (2023-05-30)

Full Changelog

Implemented enhancements:

  • Release delta-rs 0.11.0 (next release after 0.10.0) #1362
  • Support writing statistics for date columns in Rust #1209

Fixed bugs:

  • Rust writer in operations makes a lot of data copies #1394
  • Unable to read timestamp fields from column statistics #1372
  • Unable to write custom metadata via configuration since version 0.9.0 #1353
  • .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
  • Ensure decimal statistics are written correctly in Rust #1208

Merged pull requests:

  • feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
  • chore: type-check friendlier exports #1407 (roeap)
  • chore: remove ancillary crates from the git tree #1406 (rtyler)
  • chore: bump the version for the next release #1405 (rtyler)
  • feat: more efficient parquet writer and more statistics #1397 (wjones127)
  • perf: improve record batch partitioning #1396 (roeap)
  • chore: bump datafusion to 25 #1389 (roeap)
  • refactor!: remove DeltaDataType aliases #1388 (cmackenzie1)
  • feat: vacuum with concurrent requests #1382 (wjones127)
  • feat: add datafusion storage catalog #1381 (roeap)
  • docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
  • fix: delete operation when partition and non partition columns are used #1375 (Blajda)
  • fix: add conversion for string for Field::TimestampMicros (#1372) #1373 (cmackenzie1)
  • fix: allow user defined config keys #1365 (roeap)
  • ci: disable full debug symbol generation #1364 (roeap)
  • fix: include stats for all columns (#1223) #1342 (mrjoe7)

rust-v0.11.0 (2023-05-12)

Full Changelog

Implemented enhancements:

  • Implement simple delete case #832

Merged pull requests:

  • chore: update Rust package version #1346 (rtyler)
  • fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
  • feat: delete operation #1176 (Blajda)
  • feat: add wasbs to known schemes #1345 (iajoiner)
  • test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
  • feat: write command improvements #1267 (roeap)
  • feat: added support for Databricks Unity Catalog #1331 (nohajc)
  • fix: double url encode of partition key #1324 (mrjoe7)

rust-v0.10.0 (2023-05-02)

Full Changelog

Implemented enhancements:

  • Support Optimize on non-append-only tables #1125

Fixed bugs:

  • DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
  • Datafusion: SQL projection returns wrong column for partitioned data #1292
  • Unable to query partitioned tables #1291

Merged pull requests:

  • chore: add deprecation notices for commit logic on DeltaTable #1323 (roeap)
  • fix: handle local paths on windows #1322 (roeap)
  • fix: scan partitioned tables with datafusion #1303 (roeap)
  • fix: allow special characters in storage prefix #1311 (wjones127)
  • feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
  • Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
  • Enable the json feature for the parquet crate #1300 (rtyler)

rust-v0.9.0 (2023-04-14)

Full Changelog

Implemented enhancements:

  • hdfs support #300
  • Add decimal primitive type to document #1280
  • Improve error message when filtering on non-existent partition columns #1218

Fixed bugs:

  • Datafusion table provider: issues with timestamp types #441
  • Not matching column names when creating a RecordBatch from MapArray #1257
  • All stores created using DeltaObjectStore::new have an identical object_store_url #1188

Merged pull requests:

  • Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
  • chore: df / arrow changes after update #1288 (roeap)
  • feat: read schema from parquet files in datafusion scans #1266 (roeap)
  • HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
  • Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
  • Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
  • Simplify the Store Backend Configuration code #1265 (mrjoe7)
  • feat: optimistic transaction protocol #632 (roeap)
  • Write support for additional Arrow datatypes #1044(chitralverma)
  • Unique delta object store url #1212 (gruuya)
  • improve err msg on use of non-partitioned column #1221 (marijncv)

rust-v0.8.0 (2023-03-10)

Full Changelog

Implemented enhancements:

  • feat(rust): support additional types for partition values #1170

Fixed bugs:

  • File pruning does not occur on partition columns #1175
  • Bug: Error loading Delta table locally #1157
  • Deltalake 0.7.0 with s3 feature compilation error due to rusoto_dynamodb version conflict #1191
  • Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186

Merged pull requests:

rust-v0.7.0 (2023-02-11)

Full Changelog

Implemented enhancements:

  • Support FSCK REPAIR TABLE Operation #1092
  • Expose the Delta Log in a DataFrame that's easy for analysis #1031
  • Provide case-insensitive storage options in backend #999
  • Support local file path in CreateBuilder::with_location() #998
  • Save operational params in the same way with delta io #1054 (ismoshkov)

Fixed bugs:

  • DeltaTable DataFusion TableProvider does not support filter pushdown #1064
  • DeltaTable DataFusion scan does not prune files properly #1063
  • deltalake.DeltaTable constructor hangs in Jupyter #1093
  • Transaction log JSON formatting issue when writing data via Python bindings #1017
  • crates.io entry is missing link to rustdoc documentation #1076
  • URL Registered with ObjectStore registry is different from url in DeltaScan #1018
  • Not able to connect to Azure Storage with client id/secret #977
  • Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
  • Overwrite mode does not work with Azure #939
  • Use Chrono without default features #914
  • cargo test does not run due to tls conflict #985
  • Azure SAS authorization fails with <AuthenticationErrorDetail>Signature fields not well formed. #910

Merged pull requests:

  • Make rustls default across all packages #1097 (wjones127)
  • Implement filesystem check #1103 (Blajda)
  • refactor: move vacuum command to operations module #1045 (roeap)
  • feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
  • feat: improve storage location handling #1065 (roeap)
  • Fix to support UTC timezone #1022 (andrei-ionescu)
  • feat: harmonize and simplify storage configuration #1052 (roeap)
  • feat: expose function to get table of add actions #1033 (wjones127)
  • fix: change unexpected field logging level to debug #1112 (houqp)
  • fix: datafusion predicate pushdown and dependencies #1071 (roeap)
  • fix: azure sas key url encoding #1036 (roeap)
  • Add provisional workaround to support CDC #1039 #1042 (Fazzani)
  • improve debuggability of json ser/de errors #1119 (houqp)
  • Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
  • minor: optimize partition lookup for vacuum loop #1120 (houqp)
  • Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
  • add test for null_count_schema_for_fields #1135 (marijncv)
  • add test for min_max_schema_for_fields #1122 (marijncv)
  • add test for get_boolean_from_metadata #1121 (marijncv)
  • add test for left_larger_than_right #1110 (marijncv)
  • Add test for: to_scalar_value #1086 (marijncv)
  • Fix typo in delta-inspect #1072 (byteink)
  • chore: update datafusion #1114 (roeap)

rust-v0.6.0 (2022-12-16)

Full Changelog

Implemented enhancements:

  • Support Apache Arrow DataFusion 15 #1020
  • Python package: Loosen version requirements for maturin #1004
  • Remove Cargo.lock from library crates and add Cargo.lock to binary ones #1000
  • More frequent Rust releases #969
  • Thoughts on adding read_delta to pandas #869
  • Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)

Fixed bugs:

  • Azure SAS signatures ending in "=" don't work #1003
  • Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
  • error reading delta table to pandas: runtime dropped the dispatch task #975
  • MacOS arm64 wheels are generated incorrectly #972
  • Overwrite creates new file #960
  • The written delta file has corrupted structure #956
  • Write mode doesn't work with Azure storage #955
  • Python: We don't error on reader protocol v2 #886
  • Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator