Title
Hjb-Rl:Initializing Reinforcement Learning With Optimal Control Policies Applied To Autonomous Drone Racing
Abstract
In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates, through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified, reduced order model of the drone. Next, we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally, we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course, this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a high-fidelity AirSim drone race with 12 gates in 34.89s (on average), outracing a model-based HJB policy by 33.20s, a supervised learning policy by 1.24s, and a trajectory planning policy by 12.99s, while a model-free RL policy was never able to complete the race.
Year
DOI
Venue
2021
10.15607/RSS.2021.XVII.062
ROBOTICS: SCIENCE AND SYSTEM XVII
DocType
ISSN
Citations 
Conference
2330-7668
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Keiko Nagami100.34
Mac Schwager200.68