Abstract | ||
---|---|---|
In recent years, general purpose ×86 architectures have undergone significant modifications towards high performance computing capabilities. Lately, technologies like wider vector units or Fused Multiply-Add (FMA) instruction, which were mainly known from GPU arcitectures, have been introduced. In this paper, we examine the performance of current ×86 architectures, namely Intel Sandy Bridge and AMD Bulldozer, for four different parallel workloads with different properties. These properties comprise optimally cache-blocked algorithms as well as adaptive grid structures resulting in memory latency and bandwidth bound executions. The achieved performance on both architectures is very promising, and, if extrapolated towards upcoming server silicon, can be regarded as on par with current high-end GPU based accelerators. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/ISPDC.2012.15 | Parallel and Distributed Computing |
Keywords | Field | DocType |
different parallel workloads,adaptive grid structure,x86 architecture,different property,fused multiply-add,amd bulldozer,exploiting state-of-the-art x86 architectures,current high-end gpu,gpu arcitectures,scientific computing,intel sandy bridge,high performance computing capability,instruction sets,symmetric matrices,matrix decomposition,vectors,memory latency,vectorization,multi core,registers,computer architecture,parallel processing | x86,Computer architecture,Supercomputer,Computer science,Instruction set,Parallel computing,Vectorization (mathematics),Bandwidth (signal processing),Multi-core processor,CAS latency,Grid,Distributed computing | Conference |
ISBN | Citations | PageRank |
978-1-4673-2599-8 | 2 | 0.44 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexander Heinecke | 1 | 344 | 32.67 |
Thomas Auckenthaler | 2 | 2 | 0.44 |
Carsten Trinitis | 3 | 151 | 29.80 |