Unbiased Estimation of the Value of an Optimized Policy. - Citegraph

Paper Info

Title
Unbiased Estimation of the Value of an Optimized Policy.

Abstract
Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), direct application of those samples to learn the the optimized policy P generally does not provide an unbiased estimator of the value of P as the samples were observed when constructing P. In situations where the cost and risks associated of deploying a policy are high, such an unbiased estimator is highly desirable. present a procedure for learning optimized policies and getting unbiased estimates for the value of deploying them. We wrap any policy learning procedure with a bagging process and obtain out-of-bag policy inclusion decisions for each sample. We then prove that inverse-propensity-weighting effect estimator is unbiased when applied to the optimized subset. Likewise, we apply the same idea to obtain out-of-bag unbiased per-sample value estimate of the measurement that is independent of the randomized treatment, and use these estimates to build an unbiased doubly-robust effect estimator. Lastly, we empirically shown that even when the average treatment effect is negative we can find a positive optimized policy.

Year	Venue	Field
2018	arXiv: Learning	Mathematical optimization,Average treatment effect,Policy learning,Bias of an estimator,Test data,Unbiased Estimation,Mathematics,Estimator
DocType	Volume	Citations
Journal	abs/1806.02794	0
PageRank	References	Authors
0.34	5	2

Authors (2 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Elon Portugaly	1	286	25.89
Joseph J. Pfeiffer III	2	60	5.95

1