Abstract | ||
---|---|---|
We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal Q-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and Q-learning with uniform discretization.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3393691.3394176 | SIGMETRICS '20: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems
Boston
MA
USA
June, 2020 |
DocType | Volume | Issue |
Conference | 3 | 3 |
ISBN | Citations | PageRank |
978-1-4503-7985-4 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sean R. Sinclair | 1 | 0 | 1.01 |
Siddhartha Banerjee | 2 | 185 | 22.85 |
Christina Lee Yu | 3 | 1 | 1.39 |