Dynamic Balancing For Model Selection In Bandits And Rl - Citegraph

Paper Info

Title
Dynamic Balancing For Model Selection In Bandits And Rl

Abstract
We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base algorithm that may or may not hold. We select base algorithms to play in each round using a "balancing condition" on the candidate regret bounds. Our approach simultaneously recovers previous worst-case regret bounds, while also obtaining much smaller regret in natural scenarios when some base learners significantly exceed their candidate bounds. Our framework is relevant in many settings, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and tuning confidence parameters of algorithms such as LinUCB. Moreover, unlike recent efforts in model selection for linear stochastic bandits, our approach can be extended to consider adversarial rather than stochastic contexts.

Year	Venue	DocType
2021	INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139	Conference
Volume	ISSN	Citations
139	2640-3498	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Cutkosky, Ashok	1	14	10.02
Christoph Dann	2	91	11.83
Abhimanyu Das	3	314	22.43
Claudio Gentile	4	1166	107.46
Aldo Pacchiano	5	10	11.62
Manish Purohit	6	46	10.84

1