Abstract | ||
---|---|---|
This paper presents work on part-of-speech tagging of German social media and web texts. We take a simple Hidden Markov Model based tagger as a starting point, and extend it with a distributional approach to estimating lexical (emission) probabilities of out-of-vocabulary words, which occur frequently in social media and web texts and are a major reason for the low performance of off-the-shelf taggers on these types of text. We evaluate our approach on the recent EmpiriST 2015 shared task dataset and show that our approach improves accuracy on out-of-vocabulary tokens by up to 5.8%; overall, we improve state-of-the-art by 0.4% to 90.9% accuracy. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-73706-5_7 | Lecture Notes in Artificial Intelligence |
DocType | Volume | ISSN |
Conference | 10713 | 0302-9743 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Stefan Thater | 1 | 756 | 38.54 |