Improved upper bounds on the expected error in constant step-size Q-learning - Citegraph

Paper Info

Title
Improved upper bounds on the expected error in constant step-size Q-learning

Abstract
We consider fixed step-size Q-learning algorithms applied to finite state and action space, discounted reward Markov decision problems (MDPs). In previous work we derived a bound on the first moment of the Q-value estimation error, specifically on the expected steady-state value of the infinity norm of the error. The goal in both this paper, and the previous, is to maximize a discounted sum of rewards over an infinite time horizon. However, in our previous work, the bound we derived holds only when the step-size is sufficiently, and sometimes impractically, small. In this paper, we present a new error bound that, as before, goes to zero as the step-size goes to zero, but is also valid for all values of the step-size. To obtain the new bound, we divide time into frames such that the probability that there is some state that is not visited within the frame is strictly less than 1: Our error bound is then found by sampling the system one time in every frame.

Year	DOI	Venue
2013	10.1109/ACC.2013.6580117	American Control Conference
Keywords	Field	DocType
Markov processes,decision making,error statistics,finite state machines,infinite horizon,learning (artificial intelligence),MDPs,Q-value estimation error,constant step-size Q-learning,discounted reward Markov decision problems,finite action space,finite state space,fixed step-size Q-learning algorithms,infinite time horizon,steady-state value,system sampling,upper bounds	Applied mathematics,Uniform norm,Combinatorics,Decision problem,Markov process,Time horizon,Control theory,Markov chain,Q-learning,Moment (mathematics),Markov kernel,Mathematics	Conference
ISSN	ISBN	Citations
0743-1619	978-1-4799-0177-7	0
PageRank	References	Authors
0.34	2	2

Authors (2 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Carolyn L. Beck	1	401	60.19
Srikant, R.	2	6868	544.90

1