Word clustering for a word bi-gram model - Citegraph

Paper Info

Title
Word clustering for a word bi-gram model

Abstract
ABSTRACT In this paper we describe a word clustering method for class-based n-gram model. The measurement,for clus- tering is the entropy on a corpus different from the cor- pus for n-gram model estimation. The search method,is based on the greedy algorithm. We applied this method to a Japanese EDR corpus and,English Penn Treebank corpus. The perplexities of word-based n-gram model on EDR corpus and Penn Treebank are 153.1 and 203.5 re- spectively. And Those of class-based n-gram model, esti- mated through our method, are 146.4 and 136.0 respec- tively. The result tells us that our clustering methods,is better than the Brown’s method,and the Ney’s method called leaving-one-out.

Year	Venue	Keywords
1998	ICSLP	greedy algorithm
DocType	Citations	PageRank
Conference	3	0.46
References	Authors
4	3

Authors (3 rows)

Cited by (3 rows)

References (4 rows)

Name	Order	Citations	PageRank
Shinsuke Mori	1	474	47.78
Masafumi Nishimura	2	112	22.77
Nobuyasu Itoh	3	65	13.19

1