Title
Finding Additive Biclusters with Random Background
Abstract
The biclustering problem has been extensively studied in many areas including e-commerce, data mining, machine learning, pattern recognition, statistics, and more recently in computational biology. Given an n×mmatrix A(n茂戮驴 m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard under various mathematical models. In this paper, we study a probabilistic model of the implanted additive bicluster problem, where each element in the n×mbackground matrix is a random number from [0, L茂戮驴 1], and a k×kimplanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L茂戮驴 1] with probability 茂戮驴. We propose an O(n2m) time voting algorithm to solve the problem. We show that for any constant 茂戮驴such that $(1-\delta)(1-\theta)^2 -\frac 1 L 0$, when $k \ge \max \left\{\frac 8 \alpha \sqrt{n\log n},~ \frac {8 \log n} c + \log (2L)\right\}$, where cis a constant number, the voting algorithm can correctly find the implanted bicluster with probability at least $1 - \frac{9}{n^{2}}$. We also implement our algorithm as a software tool for finding novel biclusters in microarray gene expression data, called VOTE. The implementation incorporates several nontrivial ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with multiple (and overlapping) implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.
Year
DOI
Venue
2008
10.1007/978-3-540-69068-9_25
CPM
Keywords
Field
DocType
novel biclusters,small biclusters,biclustering problem,kimplanted additive bicluster,time voting algorithm,additive bicluster problem,constant number,random background,random number,finding additive biclusters,error-free additive bicluster,log n,probabilistic model,machine learning,pattern recognition,chernoff bound,mathematical model,objective function,computational biology,data mining,e commerce
Row,Software tool,Discrete mathematics,Binary logarithm,Combinatorics,Voting algorithm,Matrix (mathematics),Statistical model,Biclustering,Chernoff bound,Mathematics
Conference
Volume
ISSN
Citations 
5029
0302-9743
1
PageRank 
References 
Authors
0.35
20
4
Name
Order
Citations
PageRank
Jing Xiao118116.09
Lusheng Wang22433224.97
Xiaowen Liu3323.60
Tao Jiang41809155.32