Abstract | ||
---|---|---|
Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data. However, when the reward model is not well-specified, the bandit algorithm may incur unexpected regret, so recent work has focused on algorithms that are robust to misspecification. We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase. Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level. Compared to prior work, we attain similar regret guarantees, but we do no rely on a master algorithm, and do not require more robust oracles like online or constrained regression oracles (e.g., (Foster et al., 2020a); (Krishnamurthy et al., 2020)). This allows us to design algorithms for more general function approximation classes. |
Year | Venue | DocType |
---|---|---|
2021 | INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | Conference |
Volume | ISSN | Citations |
139 | 2640-3498 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sanath Kumar Krishnamurthy | 1 | 1 | 1.37 |
Vitor Hadad | 2 | 0 | 1.01 |
Susan Athey | 3 | 23 | 4.67 |