Skip to content

Performace tips #13

@Almaz-KG

Description

@Almaz-KG

Hi, guys!

Can you provide some tips for improving performance of data loading from hive table to Vertica cluster?

My use case of using Vertica is
Spark jobs for preparing data (some transformations of business data), and result of jobs write to hive table in ORC format. After that, I run the sh script for loading to vertica
COPY vertica_schema.$my_table_new FROM $table_path_hdfs ON ANY NODE ORC.

Performance of this solution is terrible, for example loading of data with value of 1.5B rows takes > 4 hours. So, how to improve it?

Vertica cluster hosts: 6 machines
Hadoop (hive) cluster: 30 machines

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions