Bayesian Exploration for Approximate Dynamic Programming - Citegraph

Paper Info

Title
Bayesian Exploration for Approximate Dynamic Programming

Abstract
AbstractApproximate dynamic programming has been applied to solve large-scale resource allocation problems in many domains, including transportation, energy, and healthcare. The main algorithmic challenge in these applications is to accurately estimate the value of a resource in a certain state (for example, a battery that is half full, or a truck en route to a certain destination). This “value function” must be learned as part of the decision-making process, giving rise to the “exploration/exploitation” challenge: we may, and often should, choose to experiment with seemingly suboptimal actions because they have high potential to be better than we believe. In “Bayesian Exploration for Approximate Dynamic Programming,” Ilya O. Ryzhov, Martijn R. K. Mes, Warren B. Powell and Gerald van den Berg present a principled framework for modeling uncertainty about the value function and measuring the potential of a state to improve a decision-making policy. This method is scalable and performs well in experiments.Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.The e-companion is available at https://doi.org/10.1287/opre.2018.1772.

Year	DOI	Venue
2019	10.1287/opre.2018.1772	Periodicals
Keywords	Field	DocType
approximate dynamic programming,optimal learning,Bayesian learning,correlated beliefs,value of information	Dynamic programming,Mathematical optimization,Bayesian inference,Bellman equation,Resource allocation,Value of information,Mathematics,Ilya,Bayesian probability,Scalability	Journal
Volume	Issue	ISSN
67	1	0030-364X
Citations	PageRank	References
0	0.34	31
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (31 rows)

Name	Order	Citations	PageRank
Ilya O. Ryzhov	1	128	14.12
Martijn Mes	2	77	10.58
Warren B. Powell	3	1614	151.46
Gerald van den Berg	4	0	0.34

1