AlignBucket: a tool to speed up ‘all-against-all’ protein sequence alignments optimizing length constraints - Citegraph

Paper Info

Title
AlignBucket: a tool to speed up ‘all-against-all’ protein sequence alignments optimizing length constraints

Abstract
Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison. Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases.

Year	DOI	Venue
2015	10.1093/bioinformatics/btv451	BIOINFORMATICS
Field	DocType	Volume
Data mining,Protein sequencing,Computer science,Bioinformatics,Speedup	Journal	31
Issue	ISSN	Citations
23	1367-4803	1
PageRank	References	Authors
0.35	7	3

Authors (3 rows)

Cited by (1 rows)

References (7 rows)

Name	Order	Citations	PageRank
Giuseppe Profiti	1	23	3.68
Piero Fariselli	2	851	96.03
Rita Casadio	3	1032	108.10

1