Abstract | ||
---|---|---|
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in tum can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an epsilon-optimal nonstationary policy with a very simple structure. |
Year | DOI | Venue |
---|---|---|
1992 | 10.1287/opre.40.6.1180 | Operations Research |
Field | DocType | Volume |
Econometrics,Decision analysis,Mathematical optimization,Weighting,Discounting,Iterative method,Markov chain,Markov decision process,Decision theory,Reward-based selection,Mathematics | Journal | 40 |
Issue | ISSN | Citations |
6 | 0030-364X | 10 |
PageRank | References | Authors |
3.39 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dmitry Krass | 1 | 483 | 82.08 |
Jerzy A. Filar | 2 | 120 | 23.36 |
Sagnik S. Sinha | 3 | 10 | 3.39 |