Abstract | ||
---|---|---|
Bayesian Dark Knowledge is a method for compressing the posterior predictive distribution of a neural network model into a more compact form. Specifically, the method attempts to compress a Monte Carlo approximation to the parameter posterior into a single network representing the posterior predictive distribution. Further, the authors show that this approach is successful in the classification setting using a student network whose architecture matches that of a single network in the teacher ensemble. In this work, we examine the robustness of Bayesian Dark Knowledge to higher levels of posterior uncertainty. We show that using a student network that matches the teacher architecture may fail to yield acceptable performance. We study an approach to close the resulting performance gap by increasing student model capacity. |
Year | Venue | DocType |
---|---|---|
2019 | CoRR | Journal |
Volume | Citations | PageRank |
abs/1906.01724 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michael Kuperberg | 1 | 7589 | 529.66 |
Benjamin M. Marlin | 2 | 0 | 2.37 |