Title
Reducing Variance in Gradient Bandit Algorithm using Antithetic Variates Method.
Abstract
Policy gradient, which makes use of Monte Carlo method to get an unbiased estimation of the parameter gradients, has been widely used in reinforcement learning. One key issue in policy gradient is reducing the variance of the estimation. From the viewpoint of statistics, policy gradient with baseline, a successful variance reduction method for policy gradient, directly applies the control variates method, a traditional variance reduction technique used in Monte Carlo, to policy gradient. One problem with control variates method is that the quality of estimation heavily depends on the choice of the control variates. To address the issue and inspired by the antithetic variates method for variance reduction, we propose to combine the antithetic variates method with traditional policy gradient for the multi-armed bandit problem. Furthermore, we achieve a new policy gradient algorithm called Antithetic-Arm Bandit (AAB). In AAB, the gradient is estimated through coordinate ascent where at each iteration gradient of the target arm is estimated through: 1) constructing a sequence of arms which is approximately monotonic in terms of estimated gradients, 2) sampling a pair of antithetic arms over the sequence, and 3) re-estimating the target gradient based on the sampled pair. Theoretical analysis proved that AAB achieved an unbiased and variance reduced estimation. Experimental results based on a multi-armed bandit task showed that AAB can achieve state-of-the-art performances.
Year
DOI
Venue
2018
10.1145/3209978.3210068
SIGIR
Keywords
Field
DocType
Policy gradient,Antithetic variates,Coordinate gradient
Monotonic function,Monte Carlo method,Computer science,Control variates,Algorithm,Unbiased Estimation,Sampling (statistics),Antithetic variates,Variance reduction,Reinforcement learning
Conference
ISBN
Citations 
PageRank 
978-1-4503-5657-2
0
0.34
References 
Authors
1
5
Name
Order
Citations
PageRank
Sihao Yu100.34
Jun Xu201.01
Yanyan Lan3100563.59
Jiafeng Guo41737102.17
Xueqi Cheng53148247.04