Title
LU factorization on heterogeneous systems: an energy-efficient approach towards high performance.
Abstract
Dense lower–upper (LU) factorization (hereafter referred to as LU) is a critical kernel that is widely used to solve dense linear algebra problems. Hybrid LU algorithms have been well designed to exploit the full capacity of heterogeneous systems. However, existing heterogeneous implementations are typically CPU-centric, which rely highly on CPU cores and suffer from a large amount of data transfers via the PCIe bus, and thus reduce the overall energy efficiency of the entire computer system. In this paper, we provide a coprocessor-resident implementation of LU for a heterogeneous platform to improve energy efficiency by relieving the CPUs from performing heavy load computations and avoiding excessive data transfers via PCIe. To maintain the performance, we conduct optimizations to pipeline the CPU computation, coprocessor computation, MPI communication, and PCIe transfer between the CPUs and coprocessors. The experiments on the Tianhe-2 supercomputer show that our LU implementation can compete with the highly optimized Intel MKL implementation in performance and overcome the limitations of energy efficiency.
Year
DOI
Venue
2017
10.1007/s00607-016-0537-2
Computing
Keywords
Field
DocType
LU factorization, Heterogeneous, Energy efficiency, MIC, 68W10
Kernel (linear algebra),Central processing unit,Supercomputer,Efficient energy use,Computer science,Parallel computing,Coprocessor,PCI Express,Multi-core processor,LU decomposition
Journal
Volume
Issue
ISSN
99
8
1436-5057
Citations 
PageRank 
References 
4
0.39
19
Authors
4
Name
Order
Citations
PageRank
cheng chen171.12
Jianbin Fang226525.31
Tao Tang3427.44
Canqun Yang418829.39