Title
Co-clustering for Dual Topic Models.
Abstract
Biclustering is a data mining method that allows simultaneous clustering of two variables row and columns of a matrix. A bicluster typically corresponds to a sub-matrix that presents some coherent tendency. A traditional biclustering task for categorical variables is to determine heavy sub-graphs correspond to significant biclusters, i.e., biclusters with high co-occurrence values. Though algorithms have been proposed to extract sub-graphs biclusters, they present limited knowledge about the relevant importance of individual bicluster, as well as an importance of the variables for each bicluster. To address above problems, there have been several attempts to employ Bayesian method or mixture models using information theory. Although they can rank the biclusters and the variables for specific bicluster; they do not aim at extracting heavy sub-graphs biclusters. Moreover, these models force the search for biclusters in such a way that each cell in the matrix must engage in some bicluster. We attempt to mitigate these constraints employing dual topic models. In particular first, we propose a generalised Latent Dirichlet Allocation (LDA) topic model that obtains dual topics, i.e., topics in opposite directions: row and column topics. To achieve better topics, it applies joint reinforcement, i.e., considering column-topics while creating row-topics, and vice versa. Heavy sub-graphs biclusters, the high co-occurred association, are extracted using thresholds. We demonstrate that our proposed model Co-clustering for Dual Topic is useful for obtaining heavy sub-graphs biclusters by testing over a simulated data, a text corpus and a microarray gene expression data. The experimental results show that biclusters extracted by Co-clustering for Dual Topic model are better than traditional biclustering models.
Year
Venue
Field
2016
Australasian Conference on Artificial Intelligence
Information theory,Data mining,Latent Dirichlet allocation,Computer science,Categorical variable,Topic model,Biclustering,Cluster analysis,Mixture model,Bayesian probability
DocType
Citations 
PageRank 
Conference
1
0.34
References 
Authors
7
3
Name
Order
Citations
PageRank
Santosh Kumar171.86
Xiaoying Gao222032.95
Ian S. Welch312018.53