Multimodal Parameter-exploring Policy Gradients - Citegraph

Paper Info

Title
Multimodal Parameter-exploring Policy Gradients

Abstract
Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estimates encountered in normal policy gradient methods. It has been shown to drastically speed up convergence for several large-scale reinforcement learning tasks. However the independent normal distributions used by PGPE to search through parameter space are inadequate for some problems with multimodal reward surfaces. This paper extends the basic PGPE algorithm to use multimodal mixture distributions for each parameter, while remaining efficient. Experimental results on the Rastrigin function and the inverted pendulum benchmark demonstrate the advantages of this modification, with faster convergence to better optima.

Year	DOI	Venue
2010	10.1109/ICMLA.2010.24	ICMLA
Keywords	Field	DocType
multimodal parameter-exploring policy gradients,normal policy gradient method,faster convergence,basic pgpe algorithm,independent normal distribution,high-variance gradient,multimodal mixture distribution,large-scale reinforcement,novel model-free reinforcement,parameter space,multimodal reward surface,inverted pendulum,gradient method,reinforcement learning,learning artificial intelligence,normal distribution,mixture distribution	Convergence (routing),Mathematical optimization,Normal distribution,Inverted pendulum,Computer science,Rastrigin function,Parameter space,Artificial intelligence,Machine learning,Reinforcement learning,Speedup	Conference
Citations	PageRank	References
4	0.45	7
Authors
4

Authors (4 rows)

Cited by (4 rows)

References (7 rows)

Name	Order	Citations	PageRank
Frank Sehnke	1	527	39.18
Graves, Alex	2	8572	405.10
Christian Osendorfer	3	125	13.24
Jürgen Schmidhuber	4	17836	1238.63

1