Abstract | ||
---|---|---|
Consider an MxN matrix, where the (i,j)th entry represents the affinity between the i_th entity of the first type and the j_th entity of the second type. Co-clustering is an approach to simultaneously cluster both types of entities, using the affinities as the information guiding the clustering. Co-clustering has been found to achieve clustering and dimensionality reduction at the same time, and therefore it is finding application in various problems. Bregman co-clustering algorithm, which has been recently proposed, converts the co-clustering task to the search for an optimal approximation matrix. It is much more scalable but memory-based implementations have a severe computational bottleneck. In this paper we show that a significant fraction of computations performed by the Bregman co-clustering algorithm naturally map to those performed by an on-line analytical processing (OLAP) engine, making the latter a well suited data management engine for the algorithm. Based on this observation, we have developed a version of Bregman co-clustering algorithm that works on top of OLAP. Our experiments show that this version is much more scalable, achieving an order of magnitude performance improvement over the memory-based implementation. We believe this unlocks the power of this novel technique for application to much larger datasets. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-68125-0_90 | PAKDD |
Keywords | Field | DocType |
optimal approximation matrix,j_th entity,o scalable bregman co-clustering,mxn matrix,larger datasets,i_th entity,data management engine,bregman co-clustering algorithm,memory-based implementation,dimensionality reduction,co-clustering task,sql,data management,olap,data cube | Bottleneck,Data mining,Dimensionality reduction,Computer science,Input/output,Theoretical computer science,Artificial intelligence,Biclustering,Cluster analysis,Online analytical processing,Data cube,Machine learning,Scalability | Conference |
Volume | ISSN | ISBN |
5012 | 0302-9743 | 3-540-68124-8 |
Citations | PageRank | References |
3 | 0.40 | 12 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kuo-Wei Hsu | 1 | 53 | 6.38 |
Arindam Banerjee | 2 | 4716 | 233.98 |
Jaideep Srivastava | 3 | 5845 | 871.63 |