Performace tips

Hi, guys!

Can you provide some tips for improving performance of data loading from hive table to Vertica cluster?

My use case of using Vertica is
Spark jobs for preparing data (some transformations of business data), and result of jobs write to hive table in ORC format. After that, I run the sh script for loading to vertica
`COPY vertica_schema.$my_table_new FROM $table_path_hdfs ON ANY NODE ORC`.

Performance of this solution is terrible, for example loading of data with value of 1.5B rows takes > 4 hours. So, how to improve it?

Vertica cluster hosts: 6 machines
Hadoop (hive) cluster:  30 machines




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performace tips #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performace tips #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions