Skip to content

Don't understand the result #17734

@bithw1

Description

@bithw1

I am using Hudi 0.15.0, In spark sql, I run following simple query,

When I run select * from hudi_cow_20251229_07, the result is as follows, I wonder why 1,2,3 and 1,3,6 are gone(I am using insert, no duplicates should be dropped)

spark-sql> select * from hudi_cow_20251229_07;
spark-sql> select * from hudi_cow_20251229_07;
_hoodie_commit_time     _hoodie_commit_seqno    _hoodie_record_key      _hoodie_partition_path  _hoodie_file_name       a       b       c
20251229154740370       20251229154740370_0_0   1               6efdbc56-1ebd-4cec-a1d4-6738aed8352b-0_0-15-21_20251229154740370.parquet        1       4       7

set hoodie.spark.sql.insert.into.operation=insert;
set hoodie.datasource.write.insert.drop.duplicates=false;
set hoodie.datasource.write.insert.dup.policy=none;

CREATE TABLE IF NOT EXISTS hudi_cow_20251229_07 (
  a INT,
  b INT,
  c INT
) 

USING hudi

tblproperties(
type='cow',
primaryKey='a',
hoodie.datasource.write.precombine.field='c'
);

insert into  hudi_cow_20251229_07(a,b,c) values(1,2,3),(1,4,7),(1,3,6);

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions