Abstract | ||
---|---|---|
Word segmentation is the most fundamental and impor- tant process for Japanese or Chinese language processing. Because there is no separation between words in these lan- guages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by proba- bilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural lan- guages. We may regard Japanese sentence or Chinese sen- tence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natu- ral language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/CIT.2007.195 | CIT |
Keywords | Field | DocType |
source model,ral language processing,word-valued source,chinese sen,japanese sentence,word segmentation problem,non-prefix-free wvs,word segmentation,chinese language processing,bilistic language model,language model,natural language,source code,natural language processing | Computer science,Source code,Speech recognition,Text segmentation,Natural language,Artificial intelligence,Natural language processing,Probabilistic logic,Sentence,Word processing,Language model,Computation | Conference |
ISBN | Citations | PageRank |
0-7695-2983-6 | 0 | 0.34 |
References | Authors | |
5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takashi Ishida | 1 | 12 | 5.23 |
Toshiyasu Matsushima | 2 | 97 | 32.76 |
Shigeichi Hirasawa | 3 | 322 | 150.91 |