Title
Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems
Abstract
Performing reinforcement learning-based control of systems whose state space has many Undesired Terminal States (UTS) experiences severe convergence problems. We define UTS as terminal states without associated positive reward information. They appear in the training of over-constrained systems, when breaking a constraint implies that all the effort invested during a learning episode is lost without gathering any constructive information about how to achieve the target task. The random exploration performed by RL algorithms is unfruitful until the system reaches any final state bearing some reward that may be used to update the state-action value functions, hence UTS seriously impede the convergence of the learning process. The most efficient learning strategies avoid reaching any UTS, ensuring that each learning process episode provides useful reward information. Safe Modular State Action Veto (Safe-MSAV) policies learn specifically how to avoid state transitions leading to an UTS. The application of MSAV makes state space exploration much more efficient. Bigger ratio of UTS to the total number of states provide greater improvements. Safe-MSAV uses independent concurrent modules, each dealing with a separate kind of UTS. We report experiments on the control of Linked Multicomponent Robotic Systems (L-MCRS) showing a dramatic decrease on the computational resources required, ensuring faster as well as more accurate results than conventional exploration strategies that do not implement explicit mechanisms to avoid falling in UTS.
Year
DOI
Venue
2015
10.1016/j.ins.2015.04.005
Information Sciences
Keywords
Field
DocType
Reinforcement Learning,Linked Multicomponent Robotic Systems,Safe exploration policies,Speeding convergence of RL
Convergence (routing),Robotic systems,State space exploration,Constructive,Artificial intelligence,Modular design,State space,Veto,Machine learning,Mathematics,Reinforcement learning
Journal
Volume
Issue
ISSN
317
C
0020-0255
Citations 
PageRank 
References 
2
0.37
31
Authors
5
Name
Order
Citations
PageRank
Borja Fernández-Gauna1315.82
Manuel Graña21367156.11
José Manuel López-Guede35018.06
Ismael Etxeberria4175.83
Igor Ansoategui581.64