Title
MetaDomain: a profile HMM-based protein domain classification tool for short sequences.
Abstract
Protein homology search provides basis for functional profiling in metagenomic annotation. Profile HMM-based methods classify reads into annotated protein domain families and can achieve better sensitivity for remote protein homology search than pairwise sequence alignment. However, their sensitivity deteriorates with the decrease of read length. As a result, a large number of short reads cannot be classified into their native domain families. In this work, we introduce MetaDomain, a protein domain classification tool designed for short reads generated by next-generation sequencing technologies. MetaDomain uses relaxed position-specific score thresholds to align more reads to a profile HMM while using the distribution of alignment positions as an additional constraint to control false positive matches. In this work MetaDomain is applied to the transcriptomic data of a bacterial genome and a soil metagenomic data set. The experimental results show that it can achieve better sensitivity than the state-of-the-art profile HMM alignment tool in identifying encoded domains from short sequences. The source codes of MetaDomain are available at http://soiirceforge.net/projects/metadomain/.
Year
Venue
Keywords
2012
Biocomputing-Pacific Symposium on Biocomputing
Protein domain classification,metagenomics,short reads,profile HMM
Field
DocType
ISSN
Annotation,Protein domain,Pattern recognition,Biology,Profiling (computer programming),Source code,Pairwise sequence alignment,Metagenomics,Artificial intelligence,Hidden Markov model,Bacterial genome size
Conference
2335-6936
Citations 
PageRank 
References 
2
0.39
0
Authors
2
Name
Order
Citations
PageRank
Yuan Zhang121.07
Yanni Sun221921.16