Title
Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond
Abstract
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI's UltraViolet distributed shared memory machine, and Intel's latest x86 architecture Sandy Bridge. TifaMMy's matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel's architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version.
Year
DOI
Venue
2011
10.1109/HPCSim.2011.5999869
High Performance Computing and Simulation
Keywords
Field
DocType
C++ language,cache storage,matrix decomposition,matrix multiplication,optimising compilers,parallel architectures,shared memory systems,C++ version,SGI UltraViolet,cache oblivious algorithm,distributed shared memory machine,matrix multiplication,optimized vector intrinsic version,parallel LU decomposition code TifaMMy,vectorization compiler switches,x86 architecture Sandy Bridge,block-recursive,cache-oblivious,linear algebra,parallelization,performance,shared memory platforms
Cache-oblivious algorithm,Shared memory,Computer science,Instruction set,Parallel computing,Vectorization (mathematics),Compiler,Distributed shared memory,Intrinsics,LU decomposition
Conference
ISBN
Citations 
PageRank 
978-1-61284-380-3
3
0.87
References 
Authors
10
2
Name
Order
Citations
PageRank
Alexander Heinecke134432.67
Carsten Trinitis215129.80