Title
All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark
Abstract
As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by supporting the decomposition of several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large-scale, unsupervised pattern discovery.
Year
DOI
Venue
2018
10.1109/HPEC.2018.8547544
2018 IEEE High Performance extreme Computing Conference (HPEC)
Keywords
Field
DocType
coupled billion-scale tensors,Apache Spark,unlabeled data rises,scalable data analysis,unsupervised data analysis,tensor decompositions,meaningful cross-dimensional patterns,multidimensional data,natural candidate,meaningful pattern discovery,mine patterns,disparate sources,coupled tensor decomposition framework captures this idea,scalable implementation,billion-scale scalability,unsupervised pattern discovery,Big Data
Interpretability,Spark (mathematics),Tensor,Matrix (mathematics),Computer science,Matrix decomposition,Stress (mechanics),Computational science,Big data,Scalability
Conference
ISSN
ISBN
Citations 
2377-6943
978-1-5386-5990-8
2
PageRank 
References 
Authors
0.39
10
5
Name
Order
Citations
PageRank
Aditya Gudibanda120.39
Tom Henretty220.73
Muthu Manikandan Baskaran349333.10
James R. Ezick4173.60
Richard Lethin511817.17