A weighted Markov decision process - Citegraph

Paper Info

Title
A weighted Markov decision process

Abstract
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to "neglect" the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in tum can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an epsilon-optimal nonstationary policy with a very simple structure.

Year	DOI	Venue
1992	10.1287/opre.40.6.1180	Operations Research
Field	DocType	Volume
Econometrics,Decision analysis,Mathematical optimization,Weighting,Discounting,Iterative method,Markov chain,Markov decision process,Decision theory,Reward-based selection,Mathematics	Journal	40
Issue	ISSN	Citations
6	0030-364X	10
PageRank	References	Authors
3.39	2	3

Authors (3 rows)

Cited by (10 rows)

References (2 rows)

Name	Order	Citations	PageRank
Dmitry Krass	1	483	82.08
Jerzy A. Filar	2	120	23.36
Sagnik S. Sinha	3	10	3.39

1