Abstract | ||
---|---|---|
Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$scriptstyle mathsf{T}$A, calls classical Strassenu0027s algorithm as sub-routine, decreasing the computational cost %(expressed in number of performed products) of the conventional $A^t A$ multiplication to $frac{2}{7}n^{log_2 7}$. It works for generic rectangular matrices and exploits the peculiar symmetry of the resulting product matrix for sparing memory. We used the MPI paradigm to implement A$scriptstyle mathsf{T}$A in parallel, and we tested its performances on a small subset of nodes of the Galileo cluster. Experiments highlight good scalability and speed-up, also thanks to minimal number of exchanged messages in the designed communication system. Parallel overhead and inherently sequential time fraction are negligible in the tested configurations. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Distributed, Parallel, and Cluster Computing | Journal |
Volume | Citations | PageRank |
abs/1902.02104 | 0 | 0.34 |
References | Authors | |
9 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Viviana Arrigoni | 1 | 0 | 2.03 |
Annalisa Massini | 2 | 137 | 15.53 |