Abstract | ||
---|---|---|
Sparse direct solvers provide vital functionality for a wide variety of scientific applications. The dominated part of the sparse direct solver, LU factorization, suffers a lot from the irregularity of sparse matrices. Meanwhile, the specific characteristics of sparse solvers in circuit simulation and unique sparse pattern of circuit matrices provide more design spaces and also great challenges. In this paper, we propose a sparse solver named FLU and re-examine the performance of LU factorization from the perspectives of vectorization, parallelization, and data locality. To improve vectorization efficiency and data locality, FLU introduces a register-level supernode computation method by delicately manipulating data movement. With alternating multiple columns computation, FLU further reduces the off-chip memory accesses greatly. Furthermore, we implement a fine-grained elimination tree based parallelization scheme to fully exploit task-level parallelism. Compared with PARDISO and NICSLU, experimental results show that FLU achieves a speedup up to 19.51x (3.86x on average) and 2.56x (1.66x on average) on Intel Xeon respectively. |
Year | DOI | Venue |
---|---|---|
2022 | 10.23919/DATE54114.2022.9774499 | PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022) |
Keywords | DocType | ISSN |
High Performance Computing, Circuit Simulation, Sparse LU Factorization | Conference | 1530-1591 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhiyuan Yan | 1 | 0 | 0.34 |
Biwei Xie | 2 | 0 | 0.34 |
Xingquan Li | 3 | 0 | 0.34 |
Yungang Bao | 4 | 361 | 31.11 |