Error bounds for constant step-size Q-learning. - Citegraph

Paper Info

Title
Error bounds for constant step-size Q-learning.

Abstract
We provide a bound on the first moment of the error in the Q-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the Q-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the Q-learning error, i.e., the difference between the true Q-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the Q-learning algorithm can be derived.

Year	DOI	Venue
2012	10.1016/j.sysconle.2012.08.014	Systems & Control Letters
Keywords	Field	DocType
Q-learning,Markov decision processes,Stochastic approximation	Differential equation,Mathematical optimization,Markov chain,Q-learning,Markov decision process,Moment (mathematics),State variable,Stochastic approximation,Dynamical system,Mathematics	Journal
Volume	Issue	ISSN
61	12	0167-6911
Citations	PageRank	References
3	0.62	4
Authors
2

Authors (2 rows)

Cited by (3 rows)

References (4 rows)

Name	Order	Citations	PageRank
Carolyn L. Beck	1	401	60.19
Srikant, R.	2	6868	544.90

1