Title | ||
---|---|---|
Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art. |
Abstract | ||
---|---|---|
This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin. We focus on the question what a practitioner can expect when using state-of-the-art solutions out of the box. Moreover, we contrast these with old(er) methods and implementations for POS tagging. We examine to what degree recent efforts in tagger development pay out in improved accuracies - and at what cost, in terms of training and processing time. We also conduct in-domain vs. out-domain evaluation. Out-domain evaluations are particularly insightful because the distribution of the data which is being tagged by a user will typically differ from the distribution on which the tagger has been trained. Furthermore, two lemmatization techniques are evaluated. Finally, we compare pipeline tagging vs. a tagging approach that acknowledges dependencies between inflectional categories. |
Year | Venue | Keywords |
---|---|---|
2016 | LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | morphological tagging,lemmatization,morphologically rich languages |
Field | DocType | Citations |
Lemmatisation,Computer science,Natural language processing,Artificial intelligence,German | Conference | 6 |
PageRank | References | Authors |
0.52 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Steffen Eger | 1 | 77 | 25.00 |
Rüdiger Gleim | 2 | 39 | 6.27 |
Alexander Mehler | 3 | 186 | 36.63 |