Title | ||
---|---|---|
AlignBucket: a tool to speed up ‘all-against-all’ protein sequence alignments optimizing length constraints |
Abstract | ||
---|---|---|
Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison. Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1093/bioinformatics/btv451 | BIOINFORMATICS |
Field | DocType | Volume |
Data mining,Protein sequencing,Computer science,Bioinformatics,Speedup | Journal | 31 |
Issue | ISSN | Citations |
23 | 1367-4803 | 1 |
PageRank | References | Authors |
0.35 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Giuseppe Profiti | 1 | 23 | 3.68 |
Piero Fariselli | 2 | 851 | 96.03 |
Rita Casadio | 3 | 1032 | 108.10 |