An In-Network Parameter Aggregation using DPDK for Multi-GPU Deep Learning - Citegraph

Paper Info

Title
An In-Network Parameter Aggregation using DPDK for Multi-GPU Deep Learning

Abstract
In distributed deep neural network using remote GPU nodes, communication occurs iteratively between remote nodes for gradient aggregation. This communication latency limits the benefit of distributed training with faster GPUs. In distributed deep learning using the remote GPUs, workload of gradient aggregation is imposed on a host machine. In this paper, we therefore propose to offload the gradient aggregation to a DPDK (Data Plane Development Kit) based network switch between the host machine and remote GPUs. In this approach, the aggregation process is completed in the network using extra computation resources in the network switch. We evaluate the proposed switch when GPUs and the host communicate with a standard IP communication and a PCI Express (PCIe) over 40Gbit Ethernet (40GbE) product, respectively. The evaluation results using a standard IP communication show that the aggregation is accelerated by 2.2-2.5x compared to the aggregation executed by a host machine. The results using the PCIe over 40GbE product show that the proposed switch outperforms the aggregation done by the host machine by 1.16x. This approach is thus useful for distributed training with multiple GPUs.

Year	DOI	Venue
2020	10.1109/CANDAR51075.2020.00021	2020 Eighth International Symposium on Computing and Networking (CANDAR)
Keywords	DocType	Volume
Distributed Deep Learning,GPU,DPDK	Conference	11
Issue	ISSN	ISBN
2	2379-1888	978-1-7281-8222-3
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Masaki Furukawa	1	0	0.68
Tomoya Itsubo	2	0	0.68
Hiroki Matsutani	3	576	62.07

1