Title
Phylogeny construction with rigid gapped motifs.
Abstract
Patterns with gaps have traditionally been used as signatures of protein families or as features in binary classification. Current alignment-free algorithms construct phylogenies by comparing the repertoire and frequency of ungapped blocks in genomes and proteomes. In this article, we measure the quality of phylogenies reconstructed by comparing suitably defined sets of gapped motifs that occur in mitochondrial proteomes. We study the dependence between the quality of reconstructed phylogenies and the density, number of solid characters, and statistical significance of gapped motifs. We consider maximal motifs, as well as some of their compact generators. The average performance of suitably defined sets of gapped motifs is comparable to that of popular string-based alignment-free methods. Extremely long and sparse motifs produce phylogenies of the same or better quality than those produced by short and dense motifs. The best phylogenies are produced by motifs with 3 or 4 solid characters, while increasing the number of solid characters degrades phylogenies. Discarding motifs with low statistical significance degrades performance as well. In maximal motifs, moving from the smallest basis to bases with higher redundancy leads to better phylogenies.
Year
DOI
Venue
2012
10.1089/cmb.2012.0060
JOURNAL OF COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
alignment-free sequence comparison,gapped motifs,mitochondrial proteomes,motif composition
Genome,Protein family,Binary classification,Repertoire,Bioinformatics,Phylogenetics,Mathematics
Journal
Volume
Issue
ISSN
19.0
7
1066-5277
Citations 
PageRank 
References 
3
0.42
39
Authors
2
Name
Order
Citations
PageRank
Fabio Cunial1729.68
Alberto Apostolico21441182.20