LC-Learning: Phased Method for Average Reward Reinforcement Learning - Preliminary Results - Citegraph

Paper Info

Title
LC-Learning: Phased Method for Average Reward Reinforcement Learning - Preliminary Results

Abstract
This paper presents two methods to accelerate LC-learning, which is a novel model-based average reward reinforcement learning method to compute a bias-optimal policy in a cyclic domain. The LC-learning has successfully calculated the bias-optimal policy without any approximation approaches relying upon the notion that we only need to search the optimal cycle to find a gain-optimal policy. However it has a large complexity, since it searches most combinations of actions to detect all cycles. In this paper, we first implement two pruning methods to prevent the state explosion problem of the LC-learning. Second, we compare the improved LC-learning with one of the most rapid methods, the Prioritized Sweeping in a bus scheduling task. We show that the LC-learning calculates the bias-optimal policy more quickly than the normal Prioritized Sweeping and it also performs as well as the full-tuned version in the middle case.

Year	DOI	Venue
2002	10.1007/3-540-45683-X_24	PRICAI
Keywords	Field	DocType
gain-optimal policy,prioritized sweeping,preliminary results,phased method,large complexity,improved lc-learning,normal prioritized sweeping,full-tuned version,cyclic domain,reinforcement learning,bias-optimal policy,bus scheduling task,average reward reinforcement,machine learning,artificial intelligent,markov decision process	Markov process,Computer science,Scheduling (computing),Markov decision process,Q-learning,Artificial intelligence,Machine learning,Reinforcement learning	Conference
ISBN	Citations	PageRank
3-540-44038-0	1	0.37
References	Authors
7	3

Authors (3 rows)

Cited by (1 rows)

References (7 rows)

Name	Order	Citations	PageRank
Taro Konda	1	12	3.78
Shinjiro Tensyo	2	1	0.37
Tomohiro Yamaguchi	3	34	12.21

1