Self-Supervised Online Reward Shaping in Sparse-Reward Environments - Citegraph

Paper Info

Title
Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Abstract
We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps-the original sparse reward provides a self-supervisory signal for reward inference by ranking trajectories that the agent observes, while the policy update is performed with the newly inferred, typically dense reward function. We introduce theory that shows that, under certain conditions, this alteration of the reward function will not change the optimal policy of the original MDP, while potentially increasing learning speed significantly. Experimental results on several sparse-reward environments demonstrate that, across multiple domains, the proposed algorithm is not only significantly more sample efficient than a standard RL baseline using sparse rewards, but, at times, also achieves similar sample efficiency compared to when hand-designed dense reward functions are used.

Year	DOI	Venue
2021	10.1109/IROS51168.2021.9636020	2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS)
DocType	ISSN	Citations
Conference	2153-0858	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Farzan Memarian	1	0	0.34
Wonjoon Goo	2	4	2.17
Rudolf Lioutikov	3	68	8.69
S. Niekum	4	165	23.73
Ufuk Topcu	5	1032	115.78

1