Title
Nebula: A Scalable Privacy-Preserving Machine Learning System in Ant Financial
Abstract
With the rapid growth of data volume, data-driven machine learning models have become a necessary part of many industrial applications. Intuitively, the more high-quality data used for training leads to better model performance. However, in reality, data are usually scattered and isolated in different organizations or companies. Such a "data isolation" problem stimulates both academia and industry to explore the collaborative learning paradigm to build better models jointly with multiple data sources. Despite the potential performance gains, this learning paradigm inevitably faces privacy issues, especially for the Fintech domain where data are sensitive by nature. In this paper, we present a privacy-preserving collaborative learning system in Ant Financial, named Nebula. Our system aims to facilitate privacy-preserving collaborative model training for industrial-scale applications. Our system is built upon a ring-allreduce MPI based distributed framework. On top of that, with some optimization strategies and novel sharing scheme, our system is able to scale up to tens of millions of data samples with hundreds of thousands of features and achieve more than 100x speedup compared with the existing state-of-the-art implementations.
Year
DOI
Venue
2020
10.1145/3340531.3417418
CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-6859-9
1
PageRank 
References 
Authors
0.36
3
8
Name
Order
Citations
PageRank
Chen Cen116225.61
Bingzhe Wu2186.41
Li Wang3224.52
Chaochao Chen411519.04
Jin Tan510.70
Lei Wang63921.41
Jun Zhou7102.89
Benyu Zhang821.04