A Boundedness Theoretical Analysis For Gradp Design: A Case Study On Maze Navigation - Citegraph

Paper Info

Title
A Boundedness Theoretical Analysis For Gradp Design: A Case Study On Maze Navigation

Abstract
A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between the estimated value function and the expected value function. Then we employ the critic network in GrADP approach to approximate the Q value function, and use the action network to provide the control policy. The goal network is adopted to provide the internal reinforcement signal for the critic network over time. Finally, we illustrate that the estimated Q value function is close to the expected value function in an arbitrary small bound on the maze navigation example.

Year	Venue	Keywords
2015	2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)	Adaptive dynamic programming (ADP), goal representation ADP (GrADP), reinforcement learning, theoretical analysis, maze navigation
Field	DocType	ISSN
Convergence (routing),Dynamic programming,Optimal control,Computer science,Q value,Bellman equation,Mathematical proof,Expected value,Artificial intelligence,Reinforcement,Machine learning	Conference	2161-4393
Citations	PageRank	References
1	0.36	29
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (29 rows)

Name	Order	Citations	PageRank
Zhen Ni	1	525	33.47
Xiangnan Zhong	2	346	16.35
Haibo He	3	3653	213.96

1