Title
Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank
Abstract
The objectives of our work are to propose the relevant Universal Dependencies (UD) morphological features for Indonesian dependency treebank and to apply the proposed features to an existing treebank. We propose the use of 14 UD v2 features and the corresponding 27 feature-value tags. To evaluate the quality of the resulting treebank, we built models for lemmatization, POS tagging, morphological features analysis, and dependency parsing using UDPipe, a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing of CoNLL-U files. For lemmatization, POS tagging, and morphological features analysis tasks, the resulting models have F1-score of more than 93% that shows that the consistency of annotations for the columns LEMMA, UPOS, and FEATS in the treebank is already good. However, the accuracy of the Indonesian dependency parser built is still only 82.59% for UAS and 79.83% for LAS. The experiments also show that morphological features information has no or little impact on improving the quality of lemmatization, POS tagging, and dependency parsing models for Indonesian.
Year
DOI
Venue
2020
10.1109/IALP51396.2020.9310513
2020 International Conference on Asian Language Processing (IALP)
Keywords
DocType
ISSN
annotation guidelines,dependency treebank,morphological features,Universal Dependencies
Conference
2159-1962
ISBN
Citations 
PageRank 
978-1-7281-7690-1
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Ika Alfina172.85
daniel zeman243437.62
Arawinda Dinakaramani300.34
Indra Budi4349.48
Heru Suhartanto512.04