Abstract | ||
---|---|---|
We scientifically test Harris's hypothesis that morpheme/ word boundaries can be detected from changes in the complexity of phoneme sequences. We re-formulate his hypothesis from a more information theoretic viewpoint and use a corpus to test whether the hypothesis holds. We found that his hypothesis holds for morphemes, with an F-score of about 80%, in both English and Chinese. However, we obtained contrary results for English and Chinese with regard to word boundaries; this reflects a difference in the nature of the two languages. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/11940098_25 | ICCPOL |
Keywords | Field | DocType |
contrary result,information theoretic viewpoint,phoneme sequence,word boundary | Information theory,Morpheme,Isolating language,Computer science,Phonetics,Text segmentation,Artificial intelligence,Natural language processing,Statistical hypothesis testing | Conference |
Volume | ISSN | ISBN |
4285 | 0302-9743 | 3-540-49667-X |
Citations | PageRank | References |
4 | 0.54 | 5 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kumiko Tanaka-Ishii | 1 | 261 | 36.69 |
Zhihui Jin | 2 | 54 | 3.24 |