-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Hi, guys!
Can you provide some tips for improving performance of data loading from hive table to Vertica cluster?
My use case of using Vertica is
Spark jobs for preparing data (some transformations of business data), and result of jobs write to hive table in ORC format. After that, I run the sh script for loading to vertica
COPY vertica_schema.$my_table_new FROM $table_path_hdfs ON ANY NODE ORC.
Performance of this solution is terrible, for example loading of data with value of 1.5B rows takes > 4 hours. So, how to improve it?
Vertica cluster hosts: 6 machines
Hadoop (hive) cluster: 30 machines
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels