Title
Sparse nonnegative matrix factorization for protein sequence motif discovery
Abstract
The problem of discovering motifs from protein sequences is a critical and challenging task in the field of bioinformatics. The task involves clustering relatively similar protein segments from a huge collection of protein sequences and culling high quality motifs from a set of clusters. A granular computing strategy combined with K-means clustering algorithm was previously proposed for the task, but this strategy requires a manual selection of biologically meaningful clusters which are to be used as an initial condition. This manipulated clustering method is undisciplined as well as computationally expensive. In this paper, we utilize sparse non-negative matrix factorization (SNMF) to cluster a large protein data set. We show how to combine this method with Fuzzy C-means algorithm and incorporate bio-statistics information to increase the number of clusters whose structural similarity is high. Our experimental results show that an SNMF approach provides better protein groupings in terms of similarities in secondary structures while maintaining similarities in protein primary sequences.
Year
DOI
Venue
2011
10.1016/j.eswa.2011.04.133
Expert Syst. Appl.
Keywords
Field
DocType
sparse non-negative matrix factorization,similar protein segment,challenging task,snmf approach,large protein data,k-means clustering algorithm,fuzzy c -means,better protein grouping,protein sequence motif,sparse nonnegative matrix factorization,clustering method,clustering,protein primary sequence,protein sequence,chou–fasman parameters,protein sequence motif discovery,fuzzy c-means algorithm,non negative matrix factorization,initial condition,granular computing,structural similarity,k means clustering,secondary structure,nonnegative matrix factorization
Cluster (physics),Protein sequencing,Pattern recognition,Computer science,Fuzzy logic,Matrix decomposition,Granular computing,Structural similarity,Non-negative matrix factorization,Artificial intelligence,Cluster analysis,Machine learning
Journal
Volume
Issue
ISSN
38
10
Expert Systems With Applications
Citations 
PageRank 
References 
7
0.44
29
Authors
5
Name
Order
Citations
PageRank
Wooyoung Kim139837.86
Bernard Chen211415.75
Jingu Kim333914.34
Yi Pan42507203.23
Haesun Park53546232.42