Abstract | ||
---|---|---|
Stochastic complexity of a data set is defined as the shortest possible code length for the data ob- tainable by using some fixed set of models. This measure is of great theoretical and practical im- portance as a tool for tasks such as model selec- tion or data clustering. Unfortunately, comput- ing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complex- ity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multi- nomial data sets, the exponentiality can be re- moved without loss of accuracy. We also intro- duce a new computationally efficient approxima- tion scheme based on analytic combinatorics and assess its accuracy, together with earlier approx- imations, by comparing them to the exact form. The results suggest that due to its accuracy and efficiency, the new sharper approximation will be useful for a wide class of problems with discrete data. |
Year | Venue | Keywords |
---|---|---|
2003 | AISTATS | analytic combinatorics,data clustering |
DocType | Citations | PageRank |
Conference | 3 | 0.64 |
References | Authors | |
12 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Petri Kontkanen | 1 | 255 | 30.36 |
Wray L. Buntine | 2 | 2271 | 579.82 |
Petri Myllymaki | 3 | 69 | 9.84 |
Jorma Rissanen | 4 | 1665 | 798.14 |
Henry Tirri | 5 | 691 | 145.38 |