Title
Splitting noun compounds via monolingual and bilingual paraphrasing: a study on Japanese katakana words
Abstract
Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.
Year
Venue
Keywords
2011
EMNLP
various nlp application,noun compound,japanese katakana word,bilingual paraphrase,splitting noun compound,bilingual paraphrasing,unlabeled textual data,splitting accuracy,katakana noun compound,splitting model,transliterated foreign word,word boundary,katakana word
Field
DocType
Volume
Noun compounds,Computer science,Artificial intelligence,Natural language processing,Linguistics,Katakana
Conference
D11-1
Citations 
PageRank 
References 
7
0.49
22
Authors
2
Name
Order
Citations
PageRank
Nobuhiro Kaji125721.71
Masaru Kitsuregawa23188831.46