Title
Utilizing The Web For Automatic Word Spacing
Abstract
This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem. this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
Year
DOI
Venue
2009
10.1587/transinf.E92.D.2553
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords
Field
DocType
word spacing, word segmentation
Bottleneck,World Wide Web,Computer science,Segmentation,Text segmentation,Speech recognition,Vocabulary,Word processing
Journal
Volume
Issue
ISSN
E92D
12
1745-1361
Citations 
PageRank 
References 
0
0.34
7
Authors
5
Name
Order
Citations
PageRank
Gumwon Hong1425.46
Jeong-Hoon Lee229116.06
Young-In Song349630.11
Do-Gil Lee47310.82
Hae-Chang Rim582889.14