Title
Unbiased Estimation of the Value of an Optimized Policy.
Abstract
Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), direct application of those samples to learn the the optimized policy P generally does not provide an unbiased estimator of the value of P as the samples were observed when constructing P. In situations where the cost and risks associated of deploying a policy are high, such an unbiased estimator is highly desirable. present a procedure for learning optimized policies and getting unbiased estimates for the value of deploying them. We wrap any policy learning procedure with a bagging process and obtain out-of-bag policy inclusion decisions for each sample. We then prove that inverse-propensity-weighting effect estimator is unbiased when applied to the optimized subset. Likewise, we apply the same idea to obtain out-of-bag unbiased per-sample value estimate of the measurement that is independent of the randomized treatment, and use these estimates to build an unbiased doubly-robust effect estimator. Lastly, we empirically shown that even when the average treatment effect is negative we can find a positive optimized policy.
Year
Venue
Field
2018
arXiv: Learning
Mathematical optimization,Average treatment effect,Policy learning,Bias of an estimator,Test data,Unbiased Estimation,Mathematics,Estimator
DocType
Volume
Citations 
Journal
abs/1806.02794
0
PageRank 
References 
Authors
0.34
5
2
Name
Order
Citations
PageRank
Elon Portugaly128625.89
Joseph J. Pfeiffer III2605.95