Title
Internet Corpora: A Challenge for Linguistic Processing.
Abstract
Natural language processing tools are mostly developed for and optimized on newspaper texts, and often show a substantial performance drop when applied to other types of texts such as Twitter feeds, chat data or Internet forum posts. We explore a range of easy-to-implement methods of adapting existing part-of-speech taggers to improve their performance on Internet texts. Our results show that these methods can improve tagger performance substantially.
Year
DOI
Venue
2015
10.1007/s13222-014-0172-z
Datenbank-Spektrum
Keywords
Field
DocType
Natural language processing, Part-of-speech tagging, Computer-mediated communication
Data mining,World Wide Web,Deep linguistic processing,Computer science,Part-of-speech tagging,Newspaper,Computer-mediated communication,Multimedia,Database,The Internet
Journal
Volume
Issue
ISSN
15
1
1610-1995
Citations 
PageRank 
References 
0
0.34
7
Authors
6
Name
Order
Citations
PageRank
Andrea Horbach1227.23
Stefan Thater275638.54
Diana Steffen300.68
Peter M. Fischer457138.81
Andreas Witt500.34
Manfred Pinkal6111669.77