Abstract | ||
---|---|---|
In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/11871842_63 | ECML |
Keywords | Field | DocType |
cumulative return,maximum allowable probability,markov decision processes,inequality constraint,expected value,single return,infinite horizon,control problem,new reinforcement,markov decision process,cumulant,reinforcement learning | Mathematical optimization,Markov process,Markov decision process,Inequality,Expected value,Artificial intelligence,Infinite horizon,Mathematics,Reinforcement learning | Conference |
Volume | ISSN | ISBN |
4212 | 0302-9743 | 3-540-45375-X |
Citations | PageRank | References |
15 | 0.79 | 7 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peter Geibel | 1 | 286 | 26.62 |