Title
Biclustering Sparse Binary Genomic Data.
Abstract
Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
Year
DOI
Venue
2008
10.1089/cmb.2008.0066
JOURNAL OF COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
biclustering,binary data,transcription factor binding
Row,Row and column spaces,Matrix (mathematics),Computer science,Artificial intelligence,Binary data,Biclustering,Bioinformatics,TRANSFAC,Machine learning,Sparse matrix,Binary number
Journal
Volume
Issue
ISSN
15.0
10
1066-5277
Citations 
PageRank 
References 
12
0.99
11
Authors
3
Name
Order
Citations
PageRank
Miranda Van Uitert11036.97
Wouter Meuleman2692.71
Lodewyk F A Wessels333722.28