Reward Tuning for self-adaptive Policy in MDP based Distributed Decision-Making to ensure a Safe Mission Planning - Citegraph

Paper Info

Title
Reward Tuning for self-adaptive Policy in MDP based Distributed Decision-Making to ensure a Safe Mission Planning

Abstract
Markov Decision Process (MDP) becomes a standard model for sequential decision making under uncertainty. This planning gives the appropriate sequence of actions to perform the goal of the mission in an efficient way. Often a single agent makes decisions and performs a single action. However, in several fields such as robotics several actions can be executed simultaneously. Moreover, with the increase of the complexity of missions, the decomposition of an MDP into several sub-MDPs becomes necessary. The decomposition involves parallel decisions between different agents, but the execution of concurrent actions can lead to conflicts. In addition, problems due to the system and to sensor failures may appear during the mission; these can lead to negative consequences (e.g. crash of a UAV caused by the drop in battery charge). In this article, we present a new method to prevent behavior conflicts that can appear within distributed decision-making and to emphasize the action selection if needed to ensure the safety and the various requirements of the system. This method takes into consideration the different constraints due to antagonist actions and wile additionally considering some thresholds on transition functions to promote specific actions that guarantee the safety of the system. Then it automatically computes the rewards of the different MDPs related to the mission in order to establish a safe planning. We validate this method on a case study of UAV mission such as a tracking mission. From the list of the constraints identified for the mission, the rewards of the MDPs are recomputed in order to avoid all potential conflicts and violation of constraints related to the safety of the system, thereby ensuring a safe specification of the mission.

Year	DOI	Venue
2020	10.1109/DSN-W50199.2020.00025	2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Keywords	DocType	ISSN
Markov Decision Process,Concurrent Actions,Reward Tuning,Behavior Conflicts,Constraints on MDPs	Conference	2325-6648
ISBN	Citations	PageRank
978-1-7281-7264-4	0	0.34
References	Authors
3	3

Authors (3 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Mohand Hamadouche	1	0	0.34
Catherine Dezan	2	0	0.34
Kalinka Regina Lucas Jaquie Castelo Branco	3	45	8.40

1