Off-policy Learning for Multiple Loggers - Citegraph

Paper Info

Title
Off-policy Learning for Multiple Loggers

Abstract
It is well known that the historical logs are used for evaluating and learning policies in interactive systems, e.g. recommendation, search, and online advertising. Since direct online policy learning usually harms user experiences, it is more crucial to apply off-policy learning in real-world applications instead. Though there have been some existing works, most are focusing on learning with one single historical policy. However, in practice, usually a number of parallel experiments, e.g. multiple AB tests, are performed simultaneously. To make full use of such historical data, learning policies from multiple loggers becomes necessary. Motivated by this, in this paper, we investigate off-policy learning when the training data coming from multiple historical policies. Specifically, policies, e.g. neural networks, can be learned directly from multi-logger data, with counterfactual estimators. In order to understand the generalization ability of such estimator better, we conduct generalization error analysis for the empirical risk minimization problem. We then introduce the generalization error bound as the new risk function, which can be reduced to a constrained optimization problem. Finally, we give the corresponding learning algorithm for the new constrained problem, where we can appeal to the minimax problems to control the constraints. Extensive experiments on benchmark datasets demonstrate that the proposed methods achieve better performances than the state-of-the-arts.

Year	DOI	Keywords
2019	10.1145/3292500.3330864	log data, multiple loggers, off-policy learning
Field	DocType	ISSN
Data science,Computer science,Policy learning,Artificial intelligence,Machine learning	Conference	978-1-4503-6201-6
ISBN	Citations	PageRank
978-1-4503-6201-6	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Li He	1	0	0.68
Long Xia	2	211	8.86
Wei Zeng	3	77	7.42
Zhi-Ming Ma	4	227	18.26
Yihong Zhao	5	18	1.79
Dawei Yin	6	866	61.99

1