Code of Conduct
Search before asking
Describe the bug
set spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
use hive_catalog;
drop table test_part_table;
create table test_part_table(
word string,
num bigint
)partitioned by(dt string) stored as orc;
drop table test_part_table_tmp;
create table test_part_table_tmp(
word string,
num bigint,
dt string
);
insert into test_part_table_tmp (word,num,dt) values('1',1,'1111'),('2',2,'2222'),('3',4,'1111');
insert overwrite table test_part_table partition (dt) select word,num,dt from test_part_table_tmp;
org.apache.hadoop.fs.FileAlreadyExistsException: /warehouse/tablespace/hive/test_part_table/.hive-staging_hive_2026-02-26_12-41-55_305_5577159179436818095-1/-ext-10000/_temporary/0/_temporary/attempt_202602261241555610893446772809343_0000_m_000000_0/dt=1111/part-00000-6a1697f8-a24a-40dd-b926-6fd6634c0323.c000 for client 192.168.1.57 already exists
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:389)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2732)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2625)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:807)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:496)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
- Locate the bug code
org.apache.kyuubi.spark.connector.hive.write.FileWriterFactory.51
A thread will write data to multiple partitions. For different partitions, Spark will first close the writer and then create a new one
The writer for the same partition will be created multiple times, and the writers for ORC and Parquet do not allow duplicate files
Another question
create table test_table(
word string,
num bigint
)stored as orc;
insert into test_table values('1',1111);
select * from test_table;
1 1111
insert into test_table values('2',1111);
select * from test_table;
2 1111
1 1111
In batch processing, Spark needs to repeatedly read Hive data multiple times, and the data read multiple times should be the same
Affects Version(s)
1.10.3
Kyuubi Server Log Output
Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?
Code of Conduct
Search before asking
Describe the bug
org.apache.kyuubi.spark.connector.hive.write.FileWriterFactory.51
A thread will write data to multiple partitions. For different partitions, Spark will first close the writer and then create a new one
The writer for the same partition will be created multiple times, and the writers for ORC and Parquet do not allow duplicate files
Another question
In batch processing, Spark needs to repeatedly read Hive data multiple times, and the data read multiple times should be the same
Affects Version(s)
1.10.3
Kyuubi Server Log Output
Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
Additional context
No response
Are you willing to submit PR?