Title
Graph convolutional recurrent networks for reward shaping in reinforcement learning
Abstract
In this paper, we consider the problem of low-speed convergence in Reinforcement Learning (RL). As a solution, various potential-based reward shaping techniques were proposed to form the potential function. Learning a potential function is still challenging and comparable to building a value function from scratch. In this work, our main contribution is proposing a new scheme for reward shaping, which combines (1) the Graph Convolutional Recurrent Networks (GCRN), (2) augmented Krylov, and (3) look-ahead advice to form the potential function. We propose an architecture for GCRN that combines Graph Convolutional Networks (GCN) to capture spatial dependencies and Bi-Directional Gated Recurrent Units (Bi-GRUs) to account for temporal dependencies. Our definition of the loss function of GCRN incorporates the message passing technique of the Hidden Markov Models (HMM). Since the transition matrix of the environment is hard to compute, we use the Krylov basis to estimate the transition matrix, which outperforms the existing approximation bases. Unlike existing potential functions that only rely on states to perform reward shaping, we use both the states and actions through the look-ahead advice mechanism to produce more precise advice. Our evaluations conducted on the Atari 2600 and MuJoCo games show that our solution outperforms the state-of-the-art that utilizes GCN as the potential function in most games in terms of the learning speed while reaching higher rewards.
Year
DOI
Venue
2022
10.1016/j.ins.2022.06.050
Information Sciences
Keywords
DocType
Volume
Reinforcement Learning,Reward Shaping,GCRN,Augmented Krylov,Look-Ahead Advice,Atari,MuJoCo
Journal
608
ISSN
Citations 
PageRank 
0020-0255
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Hani Sami1192.61
Jamal Bentahar2110796.78
Azzam Mourad302.37
Hadi Otrok400.34
Ernesto Damiani53911416.18