Abstract | ||
---|---|---|
ABSTRACT In this paper we describe a word clustering method for class-based n-gram model. The measurement,for clus- tering is the entropy on a corpus different from the cor- pus for n-gram model estimation. The search method,is based on the greedy algorithm. We applied this method to a Japanese EDR corpus and,English Penn Treebank corpus. The perplexities of word-based n-gram model on EDR corpus and Penn Treebank are 153.1 and 203.5 re- spectively. And Those of class-based n-gram model, esti- mated through our method, are 146.4 and 136.0 respec- tively. The result tells us that our clustering methods,is better than the Brown’s method,and the Ney’s method called leaving-one-out. |
Year | Venue | Keywords |
---|---|---|
1998 | ICSLP | greedy algorithm |
DocType | Citations | PageRank |
Conference | 3 | 0.46 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shinsuke Mori | 1 | 474 | 47.78 |
Masafumi Nishimura | 2 | 112 | 22.77 |
Nobuyasu Itoh | 3 | 65 | 13.19 |