Title
Toward scalable matrix multiply on multithreaded architectures
Abstract
We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.
Year
DOI
Venue
2007
10.1007/978-3-540-74466-5_79
Euro-Par
Keywords
Field
DocType
smp architecture,contiguous memory,scalable matrix,memory architecture,cpu itanium2 server,linear algebra library,multithreaded architecture,simple one-dimensional data partitioning,shared memory architecture,dense linear algebra operation,always-important matrix-matrix multiplication,optimize parallel linear algebra,matrix multiplication,data collection,linear algebra,shared memory,multicore processors
Linear algebra,Computer architecture,Uniform memory access,Shared memory,Computer science,Parallel computing,Distributed memory,Thread (computing),Distributed shared memory,Multi-core processor,Distributed computing,Scalability
Conference
Volume
ISSN
ISBN
4641
0302-9743
3-540-74465-7
Citations 
PageRank 
References 
10
1.99
15
Authors
5
Name
Order
Citations
PageRank
Bryan Marker117811.58
Field G. Van Zee231223.19
Kazushige Goto339726.88
Gregorio Quintana-Ortí442543.25
Robert A. van de Geijn52047203.08