Title
Rejoinder
Abstract
In view of my discussion of the paper by Wallace and Dowe with an outline of the minimum description length (MDL) principle, of which the minimum message length (MML) principle is seen to be a primitive, restricted and inferior implementation, I do not wish to go into a detailed rebuttal of the numerous issues raised in Professor Wallace's lengthy discussion. Rather, I will summarize the salient features of the MDL principle which still appear to cause confusion and address only the relevant points also raised by the other discussors. The MDL principle requires two things: one or more suggested classes of models as probability measures or codes and the observed data. The objective is to calculate a yardstick in terms of the length of a code for the data, provided by the model classes, by which they and the models in them can be compared and to find the best in light of the data. Hence, the task of selecting the collection of the classes themselves lies beyond the principle. Accordingly, as should be clear, the shortest code length found will depend, in addition to the data, also on the model classes suggested. In order to avoid the non-computability problem of finding the shortest code length, relative to each class, none of them must include the set of all computable models. This, however, creates the difficulty that the idea of the shortest code length, called for by the principle, cannot be taken in the literal sense. Rather, it will be shortest in certain probabilistic senses, which amounts to being shortest for all typical strings generated by almost all models in each class considered. This suffices in practice. Prior knowledge is used in the principle to guide the selection of the suggested model classes, which may include a distribution for parameters, called a 'prior' in Bayesian analysis. However, unlike for Baysians, for whom the quality of such prior knowledge cannot be contested (for them there is no such thing as 'bad' or 'good' prior knowledge), its role in the MDL principle is only tentative: a prior is good if it helps to reduce the code length; otherwise it is worthless and should not be used. This is as it must be in the MDL principle, for admitting unqualified prior knowledge would create conflict with the principle. With this proviso the MDL principle does admit the use of prior information in the form of distributions for parameters. This can be accomplished either by replacing the 'canonical' prior in the normalized maximum likelihood (NML) code by any desired prior, or to calculate a mixture density, called often but misleadingly a 'Bayesian mixture', as discussed in Professor Clarke's section 2, The Bayesian Connection. Its
Year
DOI
Venue
2012
10.1093/comjnl/42.4.343
Comput. J.
DocType
Volume
Issue
Journal
42
4
Citations 
PageRank 
References 
1
0.37
1
Authors
1
Name
Order
Citations
PageRank
Jorma Rissanen11665798.14