Title
An assessment of substitution scores for protein profile-profile comparison.
Abstract
Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales.altschul@ncbi.nlm.nih.govSupplementary data are available at Bioinformatics online.
Year
DOI
Venue
2011
10.1093/bioinformatics/btr565
Bioinformatics
Keywords
Field
DocType
general form substitution score,higher substitution score,gap score,pairwise protein sequence alignment,profile substitution score,protein profile,substitution score,gold-standard alignment column,profile comparison,amino acid,profile column yield,multiple alignment,computational biology,proteins,amino acid sequence,sequence alignment,probability
Sequence alignment,Pairwise comparison,Protein sequencing,Symmetrization,Bioinformatics,Statistics,Offset (computer science),Mathematics
Journal
Volume
Issue
ISSN
27
24
1367-4811
Citations 
PageRank 
References 
4
0.45
25
Authors
3
Name
Order
Citations
PageRank
Xugang Ye1204.40
Guoli Wang240.45
Stephen F Altschul318026.55