Abstract | ||
---|---|---|
We present a fast randomized least-squares solver for distributed-memory platforms. Our solver is based on the Blendenpik algorithm, but employs a batchwise randomized unitary transformation scheme. The batchwise transformation enables our algorithm to scale the distributed memory vanilla implementation of Blendenpik by up to ×3 and provides up to ×7.5 speedup over a state-of-the-art scalable least-squares solver based on the classic QR based algorithm. Experimental evaluations on terabyte scale matrices demonstrate excellent speedups on up to 16384 cores on a Blue Gene/Q supercomputer. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2832080.2832083 | ScalA@SC |
DocType | Citations | PageRank |
Conference | 1 | 0.40 |
References | Authors | |
9 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chander Iyer | 1 | 1 | 0.40 |
Avron, Haim | 2 | 316 | 28.52 |
Georgios Kollias | 3 | 4 | 2.12 |
Yves Ineichen | 4 | 35 | 4.07 |
Christopher D. Carothers | 5 | 1022 | 61.60 |
Petros Drineas | 6 | 2165 | 201.55 |