Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward - Citegraph

Paper Info

Title
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

Abstract
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.

Year	DOI	Venue
2020	10.1007/978-3-030-64984-5_32	Australasian Conference on Artificial Intelligence
DocType	Citations	PageRank
Conference	2	0.40
References	Authors
0	1

Authors (1 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Baihan Lin	1	2	4.11

1