Title | ||
---|---|---|
User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization |
Abstract | ||
---|---|---|
Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT. |
Year | Venue | DocType |
---|---|---|
2021 | NAACL-HLT | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shohei Higashiyama | 1 | 2 | 2.75 |
Masao Utiyama | 2 | 714 | 86.69 |
Taro Watanabe | 3 | 12 | 3.33 |
Eiichiro SUMITA | 4 | 1466 | 190.87 |