BLAST Tree: Fast Filtering for Genomic Sequence Classification - Citegraph

Paper Info

Title
BLAST Tree: Fast Filtering for Genomic Sequence Classification

Abstract
With the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. It is optimized for protein annotation in metagenomic data sets because: 1) it is fast, 2) it can classify short protein fragments encoded by individual sequence reads, 3) it can find homologs to novel or rare protein families when there is not enough member sequences to build a probabilistic model. Our algorithm builds a new indexing data structure called BlastTree, which can index a large sequence family database because of our effective compression techniques. In addition, BlastTree fully exploits sequence family membership information to improve homology search sensitivity. When the BlastTree Search algorithm is incorporated into Hmmer, it runs in a fraction of the time with comparable quality.

Year	DOI	Venue
2010	10.1109/BIBE.2010.74	BioInformatics and BioEngineering
Keywords	Field	DocType
short protein,rare protein family,protein annotation,individual sequence,protein homology search,novel protein homology search,enough member sequence,protein family,novel protein,fast filtering,genomic sequence classification,blast tree,microbial community,next generation sequencing,metagenomic,probability,component,data structure,homology,trie,genomics,genome sequence,indexation,indexing,hidden markov models,blast,data structures,classification algorithms,search algorithm,bioinformatics,proteins,probabilistic model,sensitivity	Data structure,ENCODE,Protein family,Search algorithm,Pattern recognition,Computer science,Search engine indexing,Artificial intelligence,Protein Annotation,Bioinformatics,Multiple sequence alignment,Trie	Conference
ISBN	Citations	PageRank
978-1-4244-7494-3	0	0.34
References	Authors
11	4

Authors (4 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
Stuart King	1	3	1.10
Yanni Sun	2	219	21.16
James Cole	3	1	0.70
Sakti Pramanik	4	770	204.19

1