Title
On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Abstract
Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.
Year
DOI
Venue
2019
10.18653/v1/k19-1021
2989457779
Field
DocType
Citations 
Computer science,Natural language processing,Artificial intelligence
Conference
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Yi Zhu129659.12
Benjamin Heinzerling200.68
Ivan Vulic346252.59
Michael Strube42142137.32
Roi Reichart576053.53
Anna Korhonen6133692.50