Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. - Citegraph

Paper Info

Title
Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning.

Abstract
For online advertising in e-commerce, the traditional problem is to assign the right ad to the right user on fixed ad slots. In this paper, we investigate the problem of advertising with adaptive exposure, in which the number of ad slots and their locations can dynamically change over time based on their relative scores with recommendation products. In order to maintain user retention and long-term revenue, there are two types of constraints that need to be met in exposure: query-level and day-level constraints. We model this problem as constrained markov decision process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning to decouple the original advertising exposure optimization problem into two relatively independent sub-optimization problems. We also propose a constrained hindsight experience replay mechanism to accelerate the policy training process. Experimental results show that our method can improve the advertising revenue while satisfying different levels of constraints under the real-world datasets. Besides, the proposal of constrained hindsight experience replay mechanism can significantly improve the training speed and the stability of policy performance.

Year	Venue	Field
2018	arXiv: Learning	Revenue,Mathematical optimization,Advertising,Markov decision process,Online advertising,Hindsight bias,Optimization problem,Mathematics,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1809.03149	0
PageRank	References	Authors
0.34	0	11

Authors (11 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Weixun Wang	1	1	5.75
junqi jin	2	118	7.95
Jianye Hao	3	189	55.78
Chunjie Chen	4	18	8.88
Chuan Yu	5	7	5.46
Weinan Zhang	6	1228	97.24
Jun Wang	7	110	42.73
Yixi Wang	8	0	1.01
Han Li	9	9	1.24
Jian Xu	10	0	2.03
Kun Gai	11	312	20.61

1