Abstract | ||
---|---|---|
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI's UltraViolet distributed shared memory machine, and Intel's latest x86 architecture Sandy Bridge. TifaMMy's matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel's architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/HPCSim.2011.5999869 | High Performance Computing and Simulation |
Keywords | Field | DocType |
C++ language,cache storage,matrix decomposition,matrix multiplication,optimising compilers,parallel architectures,shared memory systems,C++ version,SGI UltraViolet,cache oblivious algorithm,distributed shared memory machine,matrix multiplication,optimized vector intrinsic version,parallel LU decomposition code TifaMMy,vectorization compiler switches,x86 architecture Sandy Bridge,block-recursive,cache-oblivious,linear algebra,parallelization,performance,shared memory platforms | Cache-oblivious algorithm,Shared memory,Computer science,Instruction set,Parallel computing,Vectorization (mathematics),Compiler,Distributed shared memory,Intrinsics,LU decomposition | Conference |
ISBN | Citations | PageRank |
978-1-61284-380-3 | 3 | 0.87 |
References | Authors | |
10 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexander Heinecke | 1 | 344 | 32.67 |
Carsten Trinitis | 2 | 151 | 29.80 |