Title
On the inference of dirichlet mixture priors for protein sequence comparison.
Abstract
Dirichlet mixtures provide an elegant formalism for constructing and evaluating protein multiple sequence alignments. Their use requires the inference of Dirichlet mixture priors from curated sets of accurately aligned sequences. This article addresses two questions relevant to such inference: of how many components should a Dirichlet mixture consist, and how may a maximum-likelihood mixture be derived from a given data set. To apply the Minimum Description Length principle to the first question, we extend an analytic formula for the complexity of a Dirichlet model to Dirichlet mixtures by informal argument. We apply a Gibbs-sampling based approach to the second question. Using artificial data generated by a Dirichlet mixture, we demonstrate that our methods are able to approximate well the true theory, when it exists. We apply our methods as well to real data, and infer Dirichlet mixtures that describe the data better than does a mixture derived using previous approaches.
Year
DOI
Venue
2011
10.1089/cmb.2011.0040
JOURNAL OF COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
algorithms,combinatorics,linear programming,machine learning,statistics
Hierarchical Dirichlet process,Latent Dirichlet allocation,Inference,Minimum description length,Artificial intelligence,Linear programming,Formalism (philosophy),Dirichlet distribution,Prior probability,Machine learning,Mathematics
Journal
Volume
Issue
ISSN
18.0
8
1066-5277
Citations 
PageRank 
References 
3
0.44
3
Authors
3
Name
Order
Citations
PageRank
Xugang Ye1204.40
Yi-Kuo Yu214014.43
Stephen F Altschul318026.55