Variable-length intervals in homology search - Citegraph

Paper Info

Title
Variable-length intervals in homology search

Abstract
Fast, accurate, and scalable search techniques for homology searching of large genomic collections are becoming an increasingly important requirement as genomic sequence collections continue to double in size almost yearly. Almost all homology search techniques rely on extracting fixed-length overlapping sequences from queries and database sequences, and comparing these as the first step in query evaluation; this is a feature of well-known tools such as FASTA, BLAST, and our own CAFE technique. In this paper we discuss a novel, variable-length approach to extracting subsequences that is based on homology scoring matrices. Our motivation is to achieve a balance between the speed and accuracy of fixed-length choices, that is, to encapsulate the speed of longer subsequence lengths and the accuracy of shorter ones. We show that incorporating this approach into our CAFE technique leads to a good compromise between accuracy and retrieval efficiency when searching with BLOSUM matrices sensitive to distant evolutionary relationships. We expect the same results would be achieved with other homology search techniques.

Year	Venue	Keywords
2004	APBC	fixed-length overlapping sequence,large genomic collection,genomic sequence collection,variable-length interval,variable-length approach,cafe technique,homology scoring matrix,own cafe technique,scalable search technique,homology search technique,fixed-length choice,efficiency,genome sequence
Field	DocType	Citations
Data mining,Matrix (mathematics),BLOSUM,Homology (biology),Bioinformatics,Subsequence,Mathematics,Scalability	Conference	3
PageRank	References	Authors
0.60	9	2

Authors (2 rows)

Cited by (3 rows)

References (9 rows)

Name	Order	Citations	PageRank
Abhijit Chattaraj	1	23	2.59
Hugh E. Williams	2	1048	93.45

1