Model-Free Least-Squares Policy Iteration - Citegraph

Paper Info

Title
Model-Free Least-Squares Policy Iteration

Abstract
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states. Our new algorithm, Least Squares Policy Iteration (LSPI) addresses these issues. The result is an off-policy method which can use (or reuse) data collected from any source. We have tested LSPI on several problems, including a bicycle simulator in which it learns to guide the bicycle to a goal efficiently by merely observing a relatively small number of completely random trials.

Year	Venue	Keywords
2001	NIPS	temporal difference,least square,data collection,function approximation,randomized trial,temporal difference learning,reinforcement learning
Field	DocType	Citations
Small number,Least squares,Mathematical optimization,Temporal difference learning,Function approximation,Reuse,Computer science,Least squares temporal difference,Artificial intelligence,Machine learning,Reinforcement learning	Conference	40
PageRank	References	Authors
3.68	13	2

Authors (2 rows)

Cited by (40 rows)

References (13 rows)

Name	Order	Citations	PageRank
Michail G. Lagoudakis	1	1164	79.51
Ronald Parr	2	2428	186.85

1