Abstract | ||
---|---|---|
With the rapid growth of data volume, data-driven machine learning models have become a necessary part of many industrial applications. Intuitively, the more high-quality data used for training leads to better model performance. However, in reality, data are usually scattered and isolated in different organizations or companies. Such a "data isolation" problem stimulates both academia and industry to explore the collaborative learning paradigm to build better models jointly with multiple data sources. Despite the potential performance gains, this learning paradigm inevitably faces privacy issues, especially for the Fintech domain where data are sensitive by nature. In this paper, we present a privacy-preserving collaborative learning system in Ant Financial, named Nebula. Our system aims to facilitate privacy-preserving collaborative model training for industrial-scale applications. Our system is built upon a ring-allreduce MPI based distributed framework. On top of that, with some optimization strategies and novel sharing scheme, our system is able to scale up to tens of millions of data samples with hundreds of thousands of features and achieve more than 100x speedup compared with the existing state-of-the-art implementations.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3340531.3417418 | CIKM '20: The 29th ACM International Conference on Information and Knowledge Management
Virtual Event
Ireland
October, 2020 |
DocType | ISBN | Citations |
Conference | 978-1-4503-6859-9 | 1 |
PageRank | References | Authors |
0.36 | 3 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chen Cen | 1 | 162 | 25.61 |
Bingzhe Wu | 2 | 18 | 6.41 |
Li Wang | 3 | 22 | 4.52 |
Chaochao Chen | 4 | 115 | 19.04 |
Jin Tan | 5 | 1 | 0.70 |
Lei Wang | 6 | 39 | 21.41 |
Jun Zhou | 7 | 10 | 2.89 |
Benyu Zhang | 8 | 2 | 1.04 |