Title
Word Segmentation for the Sequences Emitted from a Word-Valued Source
Abstract
Word segmentation is the most fundamental and impor- tant process for Japanese or Chinese language processing. Because there is no separation between words in these lan- guages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by proba- bilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural lan- guages. We may regard Japanese sentence or Chinese sen- tence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natu- ral language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.
Year
DOI
Venue
2007
10.1109/CIT.2007.195
CIT
Keywords
Field
DocType
source model,ral language processing,word-valued source,chinese sen,japanese sentence,word segmentation problem,non-prefix-free wvs,word segmentation,chinese language processing,bilistic language model,language model,natural language,source code,natural language processing
Computer science,Source code,Speech recognition,Text segmentation,Natural language,Artificial intelligence,Natural language processing,Probabilistic logic,Sentence,Word processing,Language model,Computation
Conference
ISBN
Citations 
PageRank 
0-7695-2983-6
0
0.34
References 
Authors
5
3
Name
Order
Citations
PageRank
Takashi Ishida1125.23
Toshiyasu Matsushima29732.76
Shigeichi Hirasawa3322150.91