Variance Reduction Methods for Sublinear Reinforcement Learning. - Citegraph

Paper Info

Title
Variance Reduction Methods for Sublinear Reinforcement Learning.

Abstract
work considers the problem of provably optimal reinforcement learning for (episodic) finite horizon MDPs, i.e. how an agent learns to maximize his/her (long term) reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $widetilde{O}(sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $epsilon$-average regret (for any constant $epsilon$) is $O(SA)$, which is a number of samples that is far less than that required to learn any (non-trivial) estimate of the transition (the transition is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other model free approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $Omega(sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the (asymptotically) optimal regret.

Year	Venue	Field
2018	arXiv: Artificial Intelligence	Sublinear function,Data mining,Minimax,Combinatorics,Regret,Computer science,Upper and lower bounds,Horizon,Variance reduction,Asymptotically optimal algorithm,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1802.09184	3
PageRank	References	Authors
0.40	8	3

Authors (3 rows)

Cited by (3 rows)

References (8 rows)

Name	Order	Citations	PageRank
Sham Kakade	1	4365	282.77
Mengdi Wang	2	107	25.67
Lin Yang	3	31	21.21

1