Scale-free adaptive planning for deterministic dynamics & discounted rewards. - Citegraph

Paper Info

Title
Scale-free adaptive planning for deterministic dynamics & discounted rewards.

Abstract
We address the problem of planning in an environment with deterministic dynamics and stochastic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTypOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTypOOS dynamically adapts its behavior to both. This allows PlaTypOOS to be immune to two vulnerabilities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTypOOS additionally adapts to the global smoothness of the value function. PlaTypOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTypOOS learns exponentially faster.

Year	Venue	DocType
2019	international conference on machine learning	Conference
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Peter L. Bartlett	1	5482	1039.97
Victor Gabillon	2	116	9.51
Jennifer Healey	3	1643	285.32
Michal Valko	4	212	37.24

1