Constrained Upper Confidence Reinforcement Learning - Citegraph

Paper Info

Title
Constrained Upper Confidence Reinforcement Learning

Abstract
Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ($ O(T^{\frac{3}{4}}\sqrt{\log(T/\delta)})$) with respect to the reward while satisfying the constraints even while learning with probability $1-\delta$. Illustrative examples are provided.

Year	Venue	DocType
2020	L4DC	Conference
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zheng Liyuan	1	0	0.68
Lillian J. Ratliff	2	87	23.32

1