Title
Unsupervised identification of redundant domain entries in InterPro database using clustering techniques
Abstract
InterPro is a widely used database that integrates functional signatures provided by different protein sequence annotation databases with manual curation; in order to present a comprehensive database of functional sequence annotation. However, the integration of the signatures causes inconsistent and/or redundant annotations in some cases. In this study, we proposed an unsupervised method for the automatic detection of inconsistent and redundant entries in the InterPro database. Two clustering methods: Markov Cluster Algorithm (MCL) and hierarchical clustering are employed in order to investigate to what extent these signatures can be detected. Results show that a considerable amount of (~75%) redundant entries can be identified. The future goal is to develop a system that does the identification of redundant and inconsistent signatures with very high performance using machine learning techniques in a supervised fashion. The findings of the study may aid InterPro curators to fix the problematic entries. It may also be used by curators as a road map before the integration of new signatures.
Year
DOI
Venue
2015
10.1145/2808719.2811430
BCB
Field
DocType
Citations 
Data mining,Computer science,Road map,Artificial intelligence,Cluster analysis,InterPro,Hierarchical clustering,Annotation,Pattern recognition,Markov chain,Bioinformatics,Hidden Markov model,Simple Modular Architecture Research Tool,Database
Conference
0
PageRank 
References 
Authors
0.34
2
3
Name
Order
Citations
PageRank
Ahmet Süreyya Rifaioglu100.34
Tunca Dogan2213.00
Tolga Can326816.39