Abstract | ||
---|---|---|
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a $0.5$-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor $0.5$, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than $0.5$-approximation when elements arrive in arbitrary order. Our experiments demonstrate that SALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems. |
Year | Venue | DocType |
---|---|---|
2018 | ICML | Journal |
Volume | Citations | PageRank |
abs/1808.01842 | 0 | 0.34 |
References | Authors | |
17 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ashkan Norouzi-Fard | 1 | 5 | 5.84 |
Jakub Tarnawski | 2 | 16 | 6.34 |
Slobodan Mitrović | 3 | 30 | 7.68 |
Amir Zandieh | 4 | 8 | 3.56 |
Aidasadat Mousavifar | 5 | 0 | 0.34 |
Ola Svensson | 6 | 349 | 36.31 |