Title
Log-odds sequence logos.
Abstract
Motivation: DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns. Results: We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column's observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions.
Year
DOI
Venue
2015
10.1093/bioinformatics/btu634
BIOINFORMATICS
Field
DocType
Volume
Position-Specific Scoring Matrices,Data mining,Sequence logo,Alignment-free sequence analysis,Computer science,Logo,Bioinformatics,Odds,Multiple sequence alignment,Molecular Sequence Annotation,Bayes' theorem
Journal
31
Issue
ISSN
Citations 
3
1367-4803
0
PageRank 
References 
Authors
0.34
11
5
Name
Order
Citations
PageRank
Yi-Kuo Yu114014.43
John A. Capra222713.07
Aleksandar Stojmirović3857.85
David Landsman461878.83
Stephen F Altschul518026.55