Many-objective stochastic path finding using reinforcement learning. - Citegraph

Paper Info

Title
Many-objective stochastic path finding using reinforcement learning.

Abstract
A novel many-objective reinforcement learning algorithm is proposed.A benchmark many-objective pathfinding problem is introduced.We evaluated the algorithm on path finding problems with five and six objectives.Total reward obtained, solution set quality, and episode duration were measured.The proposed method outperforms the state of the art on all problems. In this paper, we investigate solutions to path finding problems with many conflicting objectives, and introduce a new model-free many objective reinforcement learning algorithm, called Voting Q-learning, that is capable of finding a set of optimal policies in an initially unknown, stochastic environment with several conflicting objectives. Current methods for solving this type of problem rely on Pareto dominance to determine which actions are optimal, which decreases in effectiveness as the number of objectives increases, ultimately selecting actions at random in environments where all potential actions are Pareto optimal. Alternative methods for addressing this problem require interaction with a decision maker or a priori knowledge of the problem structure for guidance towards optimal solutions, making them insufficient for fully autonomous use or problems where preferred solutions are initially unknown. As an alternative, we propose the use of voting methods from social choice theory to determine a set of Pareto optimal policies by aggregating preferences determined by the evaluation of environment conditions for each objective. We demonstrate the effectiveness of this method with multiple deterministic and stochastic many-objective path finding problems that are solved optimally without any advance knowledge of the problem or interaction with a decision maker, showing that our approach is the first to provide optimal performance for an autonomous, intelligent system operating in a many objective environment.

Year	DOI	Venue
2017	10.1016/j.eswa.2016.10.045	Expert Syst. Appl.
Keywords	Field	DocType
Many objective reinforcement learning,Stochastic path finding,Sequential decision making under uncertainty,Social choice theory	Pathfinding,Social choice theory,Mathematical optimization,Voting,Computer science,A priori and a posteriori,Q-learning,Artificial intelligence,Solution set,Pareto principle,Machine learning,Reinforcement learning	Journal
Volume	Issue	ISSN
72	C	0957-4174
Citations	PageRank	References
4	0.41	32
Authors
3

Authors (3 rows)

Cited by (4 rows)

References (32 rows)

Name	Order	Citations	PageRank
Bentz Tozer	1	5	0.78
Thomas A. Mazzuchi	2	236	36.86
Shahram Sarkani	3	151	27.80

1