Title | ||
---|---|---|
Unsupervised identification of redundant domain entries in InterPro database using clustering techniques |
Abstract | ||
---|---|---|
InterPro is a widely used database that integrates functional signatures provided by different protein sequence annotation databases with manual curation; in order to present a comprehensive database of functional sequence annotation. However, the integration of the signatures causes inconsistent and/or redundant annotations in some cases. In this study, we proposed an unsupervised method for the automatic detection of inconsistent and redundant entries in the InterPro database. Two clustering methods: Markov Cluster Algorithm (MCL) and hierarchical clustering are employed in order to investigate to what extent these signatures can be detected. Results show that a considerable amount of (~75%) redundant entries can be identified. The future goal is to develop a system that does the identification of redundant and inconsistent signatures with very high performance using machine learning techniques in a supervised fashion. The findings of the study may aid InterPro curators to fix the problematic entries. It may also be used by curators as a road map before the integration of new signatures. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2808719.2811430 | BCB |
Field | DocType | Citations |
Data mining,Computer science,Road map,Artificial intelligence,Cluster analysis,InterPro,Hierarchical clustering,Annotation,Pattern recognition,Markov chain,Bioinformatics,Hidden Markov model,Simple Modular Architecture Research Tool,Database | Conference | 0 |
PageRank | References | Authors |
0.34 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ahmet Süreyya Rifaioglu | 1 | 0 | 0.34 |
Tunca Dogan | 2 | 21 | 3.00 |
Tolga Can | 3 | 268 | 16.39 |