State-Dependent Exploration for Policy Gradient Methods - Citegraph

Paper Info

Title
State-Dependent Exploration for Policy Gradient Methods

Abstract
Policy Gradient methods are model-free reinforcement learning algorithms which in recent years have been successfully applied to many real-world problems. Typically, Likelihood Ratio (LR) methods are used to estimate the gradient, but they suffer from high variance due to random exploration at every time step of each training episode. Our solution to this problem is to introduce a state-dependent exploration function (SDE) which during an episode returns the same action for any given state. This results in less variance per episode and faster convergence. SDE also finds solutions overlooked by other methods, and even improves upon state-of-the-art gradient estimators such as Natural Actor-Critic. We systematically derive SDE and apply it to several illustrative toy problems and a challenging robotics simulation task, where SDE greatly outperforms random exploration.

Year	DOI	Venue
2008	10.1007/978-3-540-87481-2_16	ECML/PKDD
Keywords	Field	DocType
faster convergence,random exploration,derive sde,challenging robotics,natural actor-critic,state-dependent exploration function,policy gradient methods,policy gradient method,state-of-the-art gradient estimator,state-dependent exploration,likelihood ratio,training episode,reinforcement learning,gradient method	Convergence (routing),State dependent,Mathematical optimization,Artificial intelligence,Robotics,Mathematics,Estimator,Reinforcement learning	Conference
Volume	ISSN	Citations
5212	0302-9743	22
PageRank	References	Authors
1.98	8	3

Authors (3 rows)

Cited by (22 rows)

References (8 rows)

Name	Order	Citations	PageRank
Thomas Rückstieß	1	112	20.66
Martin Felder	2	56	18.59
Jürgen Schmidhuber	3	17836	1238.63

1