Title
Fast kernel conditional density estimation: A dual-tree Monte Carlo approach
Abstract
We describe a fast, data-driven bandwidth selection procedure for kernel conditional density estimation (KCDE). Specifically, we give a Monte Carlo dual-tree algorithm for efficient, error-controlled approximation of a cross-validated likelihood objective. While exact evaluation of this objective has an unscalable O(n^2) computational cost, our method is practical and shows speedup factors as high as 286,000 when applied to real multivariate datasets containing up to one million points. In absolute terms, computation times are reduced from months to minutes. This enables applications at much greater scale than previously possible. The core idea in our method is to first derive a standard deterministic dual-tree approximation, whose loose deterministic bounds we then replace with tight, probabilistic Monte Carlo bounds. The resulting Monte Carlo dual-tree algorithm exhibits strong error control and high speedup across a broad range of datasets several orders of magnitude greater in size than those reported in previous work. The cost of this high acceleration is the loss of the formal error guarantee of the deterministic dual-tree framework; however, our experiments show that error is still amply controlled by our Monte Carlo algorithm, and the many-order-of-magnitude speedups are worth this sacrifice in the large-data case, where cross-validated bandwidth selection for KCDE would otherwise be impractical.
Year
DOI
Venue
2010
10.1016/j.csda.2010.01.011
Computational Statistics & Data Analysis
Keywords
Field
DocType
fast algorithms,strong error control,monte carlo algorithm,scalability,kernel conditional density estimation,standard deterministic dual-tree approximation,monte carlo dual-tree algorithm,high speedup,formal error guarantee,monte carlo,dual-tree monte carlo approach,loose deterministic,high acceleration,fast kernel,dual-tree,probabilistic monte carlo,large datasets,deterministic dual-tree framework,conditional density estimation,density estimation,cross validation,error control
Econometrics,Monte Carlo method,Monte Carlo algorithm,Markov chain Monte Carlo,Quasi-Monte Carlo method,Hybrid Monte Carlo,Diffusion Monte Carlo,Monte Carlo integration,Statistics,Approximation error,Mathematics
Journal
Volume
Issue
ISSN
54
7
Computational Statistics and Data Analysis
Citations 
PageRank 
References 
6
0.60
4
Authors
3
Name
Order
Citations
PageRank
Michael P. Holmes11157.15
Alexander G. Gray299080.16
Charles L. Isbell350465.79