Word Segmentation for the Sequences Emitted from a Word-Valued Source - Citegraph

Paper Info

Title
Word Segmentation for the Sequences Emitted from a Word-Valued Source

Abstract
Word segmentation is the most fundamental and impor- tant process for Japanese or Chinese language processing. Because there is no separation between words in these lan- guages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by proba- bilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural lan- guages. We may regard Japanese sentence or Chinese sen- tence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natu- ral language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.

Year	DOI	Venue
2007	10.1109/CIT.2007.195	CIT
Keywords	Field	DocType
source model,ral language processing,word-valued source,chinese sen,japanese sentence,word segmentation problem,non-prefix-free wvs,word segmentation,chinese language processing,bilistic language model,language model,natural language,source code,natural language processing	Computer science,Source code,Speech recognition,Text segmentation,Natural language,Artificial intelligence,Natural language processing,Probabilistic logic,Sentence,Word processing,Language model,Computation	Conference
ISBN	Citations	PageRank
0-7695-2983-6	0	0.34
References	Authors
5	3

Authors (3 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Takashi Ishida	1	12	5.23
Toshiyasu Matsushima	2	97	32.76
Shigeichi Hirasawa	3	322	150.91

1