Splitting noun compounds via monolingual and bilingual paraphrasing: a study on Japanese katakana words - Citegraph

Paper Info

Title
Splitting noun compounds via monolingual and bilingual paraphrasing: a study on Japanese katakana words

Abstract
Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often out-of-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.

Year	Venue	Keywords
2011	EMNLP	various nlp application,noun compound,japanese katakana word,bilingual paraphrase,splitting noun compound,bilingual paraphrasing,unlabeled textual data,splitting accuracy,katakana noun compound,splitting model,transliterated foreign word,word boundary,katakana word
Field	DocType	Volume
Noun compounds,Computer science,Artificial intelligence,Natural language processing,Linguistics,Katakana	Conference	D11-1
Citations	PageRank	References
7	0.49	22
Authors
2

Authors (2 rows)

Cited by (7 rows)

References (22 rows)

Name	Order	Citations	PageRank
Nobuhiro Kaji	1	257	21.71
Masaru Kitsuregawa	2	3188	831.46

1