Title
Reinforcement learning for MDPs with constraints
Abstract
In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.
Year
DOI
Venue
2006
10.1007/11871842_63
ECML
Keywords
Field
DocType
cumulative return,maximum allowable probability,markov decision processes,inequality constraint,expected value,single return,infinite horizon,control problem,new reinforcement,markov decision process,cumulant,reinforcement learning
Mathematical optimization,Markov process,Markov decision process,Inequality,Expected value,Artificial intelligence,Infinite horizon,Mathematics,Reinforcement learning
Conference
Volume
ISSN
ISBN
4212
0302-9743
3-540-45375-X
Citations 
PageRank 
References 
15
0.79
7
Authors
1
Name
Order
Citations
PageRank
Peter Geibel128626.62