TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning - Citegraph

Paper Info

Title
TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning

Abstract
TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.

Year	DOI	Venue
2020	10.1109/CLOUD49709.2020.00014	2020 IEEE 13th International Conference on Cloud Computing (CLOUD)
Keywords	DocType	ISSN
distributed deep learning,parameter server architecture,P4,communication scheduling,in-network delay	Conference	2159-6182
ISBN	Citations	PageRank
978-1-7281-8781-5	1	0.41
References	Authors
3	4

Authors (4 rows)

Cited by (1 rows)

References (3 rows)

Name	Order	Citations	PageRank
Minkoo Kang	1	3	2.44
Gyeongsik Yang	2	5	6.68
Yeonho Yoo	3	1	1.42
Chuck Yoo	4	98	20.58

1