Title
TSEngine: Enable Efficient Communication Overlay in Distributed Machine Learning in WANs
Abstract
In recent years, distributed machine learning in WANs (DML-WANs), i.e., collaboratively training a high-quality ML model cross geo-distributed micro-clouds or edge devices, has attracted attention and been widely applied. Compared with cloud-centric training, DML-WANs avoids the high cost of transferring large amounts of raw data to a central cloud and privacy concerns. However, performing DML-WANs still faces challenges. Model synchronization, an essential step of DML-WANs, is accompanied by a lot of model communication cross limited-bandwidth WANs, which generates high communication overhead. Moreover, the parameter server system, which has been widely used, performs model synchronization in a centralized manner, resulting in serious communication in-cast problem. Such communication in-cast further raises the communication overhead, leading to the low efficiency of DML-WANs. To alleviate the communication in-cast, existing researches attempt to build tree-based communication overlays over the parameter server and workers. However, we identify that these approaches can not adapt to the dynamic and heterogeneous network of DML-WANs, resulting in insufficient improvements. This paper proposes TSEngine, an adaptive communication scheduler for efficient communication overlay of the parameter server system in DML-WANs. Its core idea is to dynamically schedule the communication logic over the parameter server and workers based on the active network perception. Specifically, we propose novel communication scheduling protocols for model distribution and model aggregation, respectively. We have implemented TSEngine in a mainstream parameter server system and verified its effectiveness in DML-WANs testbeds.
Year
DOI
Venue
2021
10.1109/TNSM.2021.3106315
IEEE Transactions on Network and Service Management
Keywords
DocType
Volume
Parameter server system,model distribution,model aggregation,communication overlay
Journal
18
Issue
ISSN
Citations 
4
1932-4537
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Huaman Zhou100.34
Weibo Cai200.34
Zonghang Li332.77
Hongfang Yu4388.17
Ling Liu511.05
Long Luo600.68
G. Sun714114.63