Title
Word clustering for a word bi-gram model
Abstract
ABSTRACT In this paper we describe a word clustering method for class-based n-gram model. The measurement,for clus- tering is the entropy on a corpus different from the cor- pus for n-gram model estimation. The search method,is based on the greedy algorithm. We applied this method to a Japanese EDR corpus and,English Penn Treebank corpus. The perplexities of word-based n-gram model on EDR corpus and Penn Treebank are 153.1 and 203.5 re- spectively. And Those of class-based n-gram model, esti- mated through our method, are 146.4 and 136.0 respec- tively. The result tells us that our clustering methods,is better than the Brown’s method,and the Ney’s method called leaving-one-out.
Year
Venue
Keywords
1998
ICSLP
greedy algorithm
DocType
Citations 
PageRank 
Conference
3
0.46
References 
Authors
4
3
Name
Order
Citations
PageRank
Shinsuke Mori147447.78
Masafumi Nishimura211222.77
Nobuyasu Itoh36513.19