Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems - Citegraph

Paper Info

Title
Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems

Abstract
Performing reinforcement learning-based control of systems whose state space has many Undesired Terminal States (UTS) experiences severe convergence problems. We define UTS as terminal states without associated positive reward information. They appear in the training of over-constrained systems, when breaking a constraint implies that all the effort invested during a learning episode is lost without gathering any constructive information about how to achieve the target task. The random exploration performed by RL algorithms is unfruitful until the system reaches any final state bearing some reward that may be used to update the state-action value functions, hence UTS seriously impede the convergence of the learning process. The most efficient learning strategies avoid reaching any UTS, ensuring that each learning process episode provides useful reward information. Safe Modular State Action Veto (Safe-MSAV) policies learn specifically how to avoid state transitions leading to an UTS. The application of MSAV makes state space exploration much more efficient. Bigger ratio of UTS to the total number of states provide greater improvements. Safe-MSAV uses independent concurrent modules, each dealing with a separate kind of UTS. We report experiments on the control of Linked Multicomponent Robotic Systems (L-MCRS) showing a dramatic decrease on the computational resources required, ensuring faster as well as more accurate results than conventional exploration strategies that do not implement explicit mechanisms to avoid falling in UTS.

Year	DOI	Venue
2015	10.1016/j.ins.2015.04.005	Information Sciences
Keywords	Field	DocType
Reinforcement Learning,Linked Multicomponent Robotic Systems,Safe exploration policies,Speeding convergence of RL	Convergence (routing),Robotic systems,State space exploration,Constructive,Artificial intelligence,Modular design,State space,Veto,Machine learning,Mathematics,Reinforcement learning	Journal
Volume	Issue	ISSN
317	C	0020-0255
Citations	PageRank	References
2	0.37	31
Authors
5

Authors (5 rows)

Cited by (2 rows)

References (31 rows)

Name	Order	Citations	PageRank
Borja Fernández-Gauna	1	31	5.82
Manuel Graña	2	1367	156.11
José Manuel López-Guede	3	50	18.06
Ismael Etxeberria	4	17	5.83
Igor Ansoategui	5	8	1.64

1