Graph convolutional recurrent networks for reward shaping in reinforcement learning - Citegraph

Paper Info

Title
Graph convolutional recurrent networks for reward shaping in reinforcement learning

Abstract
In this paper, we consider the problem of low-speed convergence in Reinforcement Learning (RL). As a solution, various potential-based reward shaping techniques were proposed to form the potential function. Learning a potential function is still challenging and comparable to building a value function from scratch. In this work, our main contribution is proposing a new scheme for reward shaping, which combines (1) the Graph Convolutional Recurrent Networks (GCRN), (2) augmented Krylov, and (3) look-ahead advice to form the potential function. We propose an architecture for GCRN that combines Graph Convolutional Networks (GCN) to capture spatial dependencies and Bi-Directional Gated Recurrent Units (Bi-GRUs) to account for temporal dependencies. Our definition of the loss function of GCRN incorporates the message passing technique of the Hidden Markov Models (HMM). Since the transition matrix of the environment is hard to compute, we use the Krylov basis to estimate the transition matrix, which outperforms the existing approximation bases. Unlike existing potential functions that only rely on states to perform reward shaping, we use both the states and actions through the look-ahead advice mechanism to produce more precise advice. Our evaluations conducted on the Atari 2600 and MuJoCo games show that our solution outperforms the state-of-the-art that utilizes GCN as the potential function in most games in terms of the learning speed while reaching higher rewards.

Year	DOI	Venue
2022	10.1016/j.ins.2022.06.050	Information Sciences
Keywords	DocType	Volume
Reinforcement Learning,Reward Shaping,GCRN,Augmented Krylov,Look-Ahead Advice,Atari,MuJoCo	Journal	608
ISSN	Citations	PageRank
0020-0255	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hani Sami	1	19	2.61
Jamal Bentahar	2	1107	96.78
Azzam Mourad	3	0	2.37
Hadi Otrok	4	0	0.34
Ernesto Damiani	5	3911	416.18

1