Unsupervised segmentation of Chinese text by use of branching entropy - Citegraph

Paper Info

Title
Unsupervised segmentation of Chinese text by use of branching entropy

Abstract
We propose an unsupervised segmentation method based on an assumption about language data: that the increasing point of entropy of successive characters is the location of a word boundary. A large-scale experiment was conducted by using 200 MB of unsegmented training data and 1 MB of test data, and precision of 90% was attained with recall being around 80%. Moreover, we found that the precision was stable at around 90% independently of the learning data size.

Year	Venue	Keywords
2006	ACL	large-scale experiment,test data,chinese text,unsupervised segmentation method,unsegmented training data,language data,successive character,increasing point,data size,unsupervised segmentation,word boundary
DocType	Volume	Citations
Conference	P06-2	35
PageRank	References	Authors
1.60	8	2

Authors (2 rows)

Cited by (35 rows)

References (8 rows)

Name	Order	Citations	PageRank
Zhihui Jin	1	54	3.24
Kumiko Tanaka-Ishii	2	261	36.69

1