Abstract | ||
---|---|---|
As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by supporting the decomposition of several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large-scale, unsupervised pattern discovery. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/HPEC.2018.8547544 | 2018 IEEE High Performance extreme Computing Conference (HPEC) |
Keywords | Field | DocType |
coupled billion-scale tensors,Apache Spark,unlabeled data rises,scalable data analysis,unsupervised data analysis,tensor decompositions,meaningful cross-dimensional patterns,multidimensional data,natural candidate,meaningful pattern discovery,mine patterns,disparate sources,coupled tensor decomposition framework captures this idea,scalable implementation,billion-scale scalability,unsupervised pattern discovery,Big Data | Interpretability,Spark (mathematics),Tensor,Matrix (mathematics),Computer science,Matrix decomposition,Stress (mechanics),Computational science,Big data,Scalability | Conference |
ISSN | ISBN | Citations |
2377-6943 | 978-1-5386-5990-8 | 2 |
PageRank | References | Authors |
0.39 | 10 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Aditya Gudibanda | 1 | 2 | 0.39 |
Tom Henretty | 2 | 2 | 0.73 |
Muthu Manikandan Baskaran | 3 | 493 | 33.10 |
James R. Ezick | 4 | 17 | 3.60 |
Richard Lethin | 5 | 118 | 17.17 |