Deep Learning with Logged Bandit Feedback - Citegraph

Paper Info

Title
Deep Learning with Logged Bandit Feedback

Abstract
We propose a new output layer for deep networks that permits the use of logged contextual bandit feedback for training. Such contextual bandit feedback can be available in huge quantities (e.g., logs of search engines, recommender systems) at little cost, opening up a path for training deep networks on orders of magnitude more data. To this effect, we propose a counterfactual risk minimization approach for training deep networks using an equivariant empirical risk estimator with variance regularization, BanditNet, and show how the resulting objective can be decomposed in a way that allows stochastic gradient descent training. We empirically demonstrate the effectiveness of the method in two scenarios. First, we show how deep networks -- ResNets in particular -- can be trained for object recognition without conventionally labeled images. Second, we learn to place banner ads based on propensity-logged click logs, where BanditNet substantially improves on the state-of-the-art.

Year	Venue	Field
2018	international conference on learning representations	Recommender system,Stochastic gradient descent,Computer science,Minification,Regularization (mathematics),Artificial intelligence,Deep learning,Web banner,Machine learning,Cognitive neuroscience of visual object recognition,Estimator
DocType	Citations	PageRank
Conference	8	0.46
References	Authors
8	2

Authors (2 rows)

Cited by (8 rows)

References (8 rows)

Name	Order	Citations	PageRank
Thorsten Joachims	1	17387	1254.06
Maarten de Rijke	2	6516	509.76

1