Title
DRPS: efficient disk-resident parameter servers for distributed machine learning
Abstract
Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.
Year
DOI
Venue
2022
10.1007/s11704-021-0445-2
Frontiers of Computer Science
Keywords
DocType
Volume
parameter servers, machine learning, disk resident, parallel model
Journal
16
Issue
ISSN
Citations 
4
2095-2228
0
PageRank 
References 
Authors
0.34
14
4
Name
Order
Citations
PageRank
Song, Zhen100.34
Yu Gu220134.98
Wang, Zhigang300.34
Yu, Ge400.34