Title
A weighted Markov decision process
Abstract
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in tum can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an epsilon-optimal nonstationary policy with a very simple structure.
Year
DOI
Venue
1992
10.1287/opre.40.6.1180
Operations Research
Field
DocType
Volume
Econometrics,Decision analysis,Mathematical optimization,Weighting,Discounting,Iterative method,Markov chain,Markov decision process,Decision theory,Reward-based selection,Mathematics
Journal
40
Issue
ISSN
Citations 
6
0030-364X
10
PageRank 
References 
Authors
3.39
2
3
Name
Order
Citations
PageRank
Dmitry Krass148382.08
Jerzy A. Filar212023.36
Sagnik S. Sinha3103.39