Abstract | ||
---|---|---|
BSTRACT Federated learning (FL) nowadays involves compound learning tasks as cognitive applications’ complexity increases. For example, a self-driving system hosts multiple tasks simultaneously (e.g., detection, classification, etc.) and expects FL to retain life-long intelligence involvement. However, our analysis demonstrates that, when deploying compound FL models for multiple training tasks on a GPU, certain issues arise: (1) As different tasks’ skewed data distributions and corresponding models cause highly imbalanced learning workloads, current GPU scheduling methods lack effective resource allocations; (2) Therefore, existing FL schemes, only focusing on heterogeneous data distribution but runtime computing, cannot practically achieve optimally synchronized federation. To address these issues, we propose a full-stack FL optimization scheme to address both intra-device GPU scheduling and inter-device FL coordination for multi-task training. Specifically, our works illustrate two key insights in this research domain: (1) Competitive resource sharing is beneficial for parallel model executions, and the proposed concept of “virtual resource” could effectively characterize and guide the practical per-task resource utilization and allocation. (2) FL could be further improved by taking architectural level coordination into consideration. Our experiments demonstrate that the FL throughput could be significantly escalated. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1145/3487553.3524859 | International World Wide Web Conference |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yongbo Yu | 1 | 0 | 0.34 |
Fuxun Yu | 2 | 0 | 0.34 |
Zirui Xu | 3 | 0 | 0.34 |
Di Wang | 4 | 1337 | 143.48 |
Minjia Zhang | 5 | 0 | 0.34 |
Ang Li | 6 | 501 | 36.38 |
Shawn Bray | 7 | 0 | 0.34 |
Chenchen Liu | 8 | 90 | 17.45 |
Chen Xiang | 9 | 31 | 35.72 |