Skip to content

cuSolver may prominently enhance the efficiency of LCAO module #690

@haxushu

Description

@haxushu

Background

According to our ABACUS LCAO profiling, the cost of solving generalized eigenvalue problems plays a dominant role as the scale of the structure is large.

For example, when considering 512 Si atoms each 4x4x4 supercell and using 8 processes, the overhead of ELPA is as follows:

CLASS_NAME NAME TIME(Sec) CALLS AVG PER%
Diago_LCAO_Matrix elpa_solve 134.26 9 15 41

Consequently, to boost the solving procedure with LCAO module in abacus, a more efficient eigensolver will be beneficial.

Describe the solution you'd like

CuSolver may be the best policy. The conclusion is based on our report Eigensolver Benchmark where the performance of eigenvector APIs from ELPA and cuSolver is benchmarked against different criterion. Focusing on GPU accelerating situation, the overhead(in seconds) with respect to solving partial or all eigenvectors is recorded with 1 processes(1 OMP thread), nblk=32 and one V100 GPU.

SOLVER elpa1+gpu(part) elpa2+gpu(part) elpa1+gpu(all) elpa2+gpu(all) cuSolver(all)
104(26) 0.002 0.002 0.01 0.02 0.005
832(153) 0.15 0.17 0.28 0.24 0.057
6656(1228) 48* 50^ 63 62 1.069

*:18s when tuning nblk=512

^:28s when tuning nblk=128

As is vividly shown above, even though partial eigenvectors need to compute, cuSolver, which computes all by default, exhibits a much more satisfactory performance than elpa.

Additional context

We(i.e. ByteDance) plan to divide this cuSolver realization into three steps:

  • Step 1: Support a single GPU accelerating. (We are now at this step.)

  • Step 2: Support a single node multiGPU accelerating.

  • Step 3: Support multi-nodes multiGPU accelerating.

At the stage of Step 1, it is a possible strategy that a single GPU would first gather from all the processes to form the whole H and S matrix. After calling cuSolver API cusolverDnDsygvd, the outcome would finally be scattered. Depending on cuSolverMG APIs which have not matured, Step 2 and 3 would start in a proper time。

Metadata

Metadata

Assignees

Labels

Feature DiscussedThe features will be discussed first but will not be implemented soon

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions