Title
Meta Dynamic Pricing: Learning Across Experiments.
Abstract
We study the problem of learning emph{across} a sequence of price experiments for related products, focusing on implementing the Thompson sampling algorithm for dynamic pricing. We consider a practical formulation of this problem where the unknown parameters of the demand function for each product come from a prior that is shared across products, but is unknown a priori. Our main contribution is a meta dynamic pricing algorithm that learns this prior online while solving a sequence of non-overlapping pricing experiments (each with horizon $T$) for $N$ different products. Our algorithm addresses two challenges: (i) balancing the need to learn the prior (emph{meta-exploration}) with the need to leverage the current estimate of the prior to achieve good performance (emph{meta-exploitation}), and (ii) accounting for uncertainty in the estimated prior by appropriately widening the prior as a function of its estimation error, thereby ensuring convergence of each price experiment. We prove that the price of an unknown prior for Thompson sampling is negligible in experiment-rich environments (large $N$). In particular, our algorithmu0027s meta regret can be upper bounded by $widetilde{O}left(sqrt{NT}right)$ when the covariance of the prior is known, and $widetilde{O}left(N^{frac{3}{4}}sqrt{T}right)$ otherwise. Numerical experiments on synthetic and real auto loan data demonstrate that our algorithm significantly speeds up learning compared to prior-independent algorithms or a naive approach of greedily using the updated prior across products.
Year
Venue
DocType
2019
arXiv: Learning
Journal
Volume
Citations 
PageRank 
abs/1902.10918
0
0.34
References 
Authors
13
3
Name
Order
Citations
PageRank
Hamsa Bastani1162.72
David Simchi-Levi21449151.53
ruihao zhu3143.27