Abstract | ||
---|---|---|
Distributed systems incorporate GPUs because they provide massive parallelism in an energy-efficient manner. Unfortunately, existing programming models make it difficult to route a GPU-initiated network message. The traditional coprocessor model forces programmers to manually route messages through the host CPU. Other models allow GPU-initiated communication, but are inefficient for small messages.
To enable fine-grain PGAS-style communication between threads executing on different GPUs, we introduce Gravel. GPU-initiated messages are offloaded through a GPU-efficient concurrent queue to an aggregator (implemented with CPU threads), which combines messages targeting to the same destination. Gravel leverages diverged work-group-level semantics to amortize synchronization across the GPU's data-parallel lanes.
Using Gravel, we can distribute six applications, each with frequent small messages, across a cluster of eight GPU-accelerated nodes. Compared to one node, these applications run 5.3x faster, on average. Furthermore, we show Gravel is more programmable and usually performs better than prior GPU networking models.
|
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3126908.3126914 | SC |
Keywords | Field | DocType |
message aggregation,graphics processing unit (GPU),fine-grain communication,partitioned global address space (PGAS) | Synchronization,News aggregator,Programming paradigm,Computer science,Massively parallel,Queue,Parallel computing,Thread (computing),Coprocessor,Semantics | Conference |
ISSN | ISBN | Citations |
2167-4329 | 978-1-4503-5114-0 | 1 |
PageRank | References | Authors |
0.36 | 16 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marc S. Orr | 1 | 91 | 4.49 |
Shuai Che | 2 | 1743 | 82.36 |
Bradford Beckmann | 3 | 2390 | 101.06 |
Mark Oskin | 4 | 906 | 76.63 |
Steven K. Reinhardt | 5 | 3885 | 226.69 |
David A. Wood | 6 | 6058 | 617.11 |