Title
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Preliminary Results
Abstract
This paper presents two methods to accelerate LC-learning, which is a novel model-based average reward reinforcement learning method to compute a bias-optimal policy in a cyclic domain. The LC-learning has successfully calculated the bias-optimal policy without any approximation approaches relying upon the notion that we only need to search the optimal cycle to find a gain-optimal policy. However it has a large complexity, since it searches most combinations of actions to detect all cycles. In this paper, we first implement two pruning methods to prevent the state explosion problem of the LC-learning. Second, we compare the improved LC-learning with one of the most rapid methods, the Prioritized Sweeping in a bus scheduling task. We show that the LC-learning calculates the bias-optimal policy more quickly than the normal Prioritized Sweeping and it also performs as well as the full-tuned version in the middle case.
Year
DOI
Venue
2002
10.1007/3-540-45683-X_24
PRICAI
Keywords
Field
DocType
gain-optimal policy,prioritized sweeping,preliminary results,phased method,large complexity,improved lc-learning,normal prioritized sweeping,full-tuned version,cyclic domain,reinforcement learning,bias-optimal policy,bus scheduling task,average reward reinforcement,machine learning,artificial intelligent,markov decision process
Markov process,Computer science,Scheduling (computing),Markov decision process,Q-learning,Artificial intelligence,Machine learning,Reinforcement learning
Conference
ISBN
Citations 
PageRank 
3-540-44038-0
1
0.37
References 
Authors
7
3
Name
Order
Citations
PageRank
Taro Konda1123.78
Shinjiro Tensyo210.37
Tomohiro Yamaguchi33412.21