Abstract | ||
---|---|---|
Pseudo-Uniform Memory Architectures hide the memory's throughput bottlenecks and the network's latency differences in order to provide near-peak average throughput for computations on large datasets. This obviates the need for application-level partitioning and load balancing between NUMA domains but the performance of cross-core communication still depends on the actual placement of the involved variables and cores, which can result in significant variation within applications and between application runs. This paper analyses the pseudo-uniform memory latency on the Intel Xeon Phi Knights Corner processor, derives strategies for the optimised placement of important variables, and discusses the role of localised coordination in pUMA systems. For example, a basic cache line ping-pong benchmark showed a 3x speedup between adjacent cores. Therefore, pUMA systems combined with support for controlled placement of small datasets are an interesting option when processor-wide load balancing is difficult while localised coordination is feasible. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-58943-5_55 | Lecture Notes in Computer Science |
Field | DocType | Volume |
Memory bank,Physical address,Computer science,CPU cache,Load balancing (computing),Parallel computing,Throughput,Obfuscation,Memory architecture,Cache coherence,Distributed computing | Conference | 10104 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Randolf Rotta | 1 | 114 | 8.98 |
Robert Kuban | 2 | 0 | 0.68 |
Mark Simon Schöps | 3 | 0 | 0.34 |
Jörg Nolte | 4 | 29 | 10.00 |