Title
Smoothed Dual Embedding Control.
Abstract
We revisit the Bellman optimality equation with Nesterovu0027s smoothing technique and provide a unique saddle-point optimization perspective of the policy optimization problem in reinforcement learning based on Fenchel duality. A new reinforcement learning algorithm, called Smoothed Dual Embedding Control or SDEC, is derived to solve the saddle-point reformulation with arbitrary learnable function approximator. The algorithm bypasses the policy evaluation step in the policy optimization from a principled scheme and is extensible to integrate with multi-step bootstrapping and eligibility traces. We provide a PAC-learning bound on the number of samples needed from one single off-policy sample path, and also characterize the convergence of the algorithm. Finally, we show the algorithm compares favorably to the state-of-the-art baselines on several benchmark control problems.
Year
Venue
Field
2017
arXiv: Learning
Convergence (routing),Saddle,Mathematical optimization,Embedding,Bootstrapping,Computer science,Smoothing,Sample path,Optimization problem,Reinforcement learning
DocType
Volume
Citations 
Journal
abs/1712.10285
3
PageRank 
References 
Authors
0.39
0
7
Name
Order
Citations
PageRank
Bo Dai123034.71
Albert Shaw2262.45
Lihong Li367045.28
Xiao, Lin491853.00
Niao He521216.52
Jianshu Chen688352.94
Le Song72437159.27