Title | ||
---|---|---|
NUMA-aware scheduling and memory allocation for data-flow task-parallel applications. |
Abstract | ||
---|---|---|
Dynamic task parallelism is a popular programming model on shared-memory systems. Compared to data parallel loop-based concurrency, it promises enhanced scalability, load balancing and locality. These promises, however, are undermined by non-uniform memory access (NUMA) systems. We show that it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models, for both computing and memory resources, while achieving near-optimal data locality. Our run-time algorithms for NUMA-aware task and data placement are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences and reuse. This information is readily available in the run-time systems of modern task-parallel programming frameworks, and from the operating system regarding the placement of previously allocated memory. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability through the elimination of false dependences and enables fine-grained dynamic control over the placement of application data. We demonstrate that the benefits of dynamically managing data placement outweigh the privatization cost, even when comparing with target-specific optimizations through static, NUMA-aware data interleaving. Our implementation and the experimental evaluation on a set of high-performance benchmarks executing on a 192-core system with 24 NUMA nodes show that the fraction of local memory accesses can be increased to more than 99%, resulting in a speedup of up to 5× compared to a NUMA-aware hierarchical work-stealing baseline. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2851141.2851193 | PPOPP |
Field | DocType | Volume |
Programming language,Programming paradigm,Computer science,Concurrency,Task parallelism,Load balancing (computing),Parallel computing,Memory management,Speedup,Data flow diagram,Scalability,Distributed computing | Conference | 51 |
Issue | ISSN | Citations |
8 | 0362-1340 | 0 |
PageRank | References | Authors |
0.34 | 5 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andi Drebes | 1 | 29 | 4.06 |
Antoniu Pop | 2 | 198 | 14.36 |
Karine Heydemann | 3 | 116 | 13.65 |
Nathalie Drach | 4 | 71 | 9.05 |
Albert Cohen | 5 | 85 | 10.03 |