Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits. - Citegraph

Paper Info

Title
Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical Benefits.

Abstract
The softmax function has been primarily employed in reinforcement learning (RL) to improve exploration and provide a differentiable approximation to the max function, as also observed in the mellowmax paper by Asadi and Littman. This paper instead focuses on using the softmax function in the Bellman updates, independent of the exploration strategy. Our main theory provides a performance bound for the softmax Bellman operator, and shows it converges to the standard Bellman operator exponentially fast in the inverse temperature parameter. We also prove that under certain conditions, the softmax operator can reduce the overestimation error and the gradient noise. A detailed comparison among different Bellman operators is then presented to show the trade-off when selecting them. We apply the softmax operator to deep RL by combining it with the deep Q-network (DQN) and double DQN algorithms in an off-policy fashion, and demonstrate that these variants can often achieve better performance in several Atari games, and compare favorably to their mellowmax counterpart.

Year	Venue	DocType
2018	arXiv: Learning	Journal
Volume	Citations	PageRank
abs/1812.00456	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhao Song	1	21	8.86
Ronald Parr	2	2428	186.85
L. Carin	3	4603	339.36

1