Abstract | ||
---|---|---|
The rapid proliferation of microblogs such as Twitter has resulted in a vast quantity of written text becoming available that contains interesting information for NLP tasks. However, the noise level in tweets is so high that standard NLP tools perform poorly. In this pa- per, we present a statistical truecaser for tweets using a 3-gram language model built with truecased newswire texts and tweets. Our truecasing method shows an improvement in named entity recognition and part-of-speech tagging tasks. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2740908.2743039 | WWW (Companion Volume) |
Field | DocType | Citations |
Capitalization,Data mining,World Wide Web,Social media,Truecasing,Computer science,Noise level,Microblogging,Natural language processing,Artificial intelligence,Named-entity recognition,Language model | Conference | 1 |
PageRank | References | Authors |
0.36 | 16 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kamel Nebhi | 1 | 1 | 0.36 |
Kalina Bontcheva | 2 | 2538 | 211.33 |
Genevieve Gorrell | 3 | 266 | 22.00 |