Lexical normalisation of short text messages: makn sens a #twitter - Citegraph

Paper Info

Title
Lexical normalisation of short text messages: makn sens a #twitter

Abstract
Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this paper, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalising ill-formed words. Our method uses a classifier to detect ill-formed words, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.

Year	Venue	Keywords
2011	ACL	sms corpus,large volume,probable correction candidate,word similarity,novel dataset,correction candidate,morphophonemic similarity,makn sens,ill-formed word,short text message,out-of-vocabulary word,lexical normalisation
Field	DocType	Volume
Information retrieval,Computer science,Noisy text,Morphophonology,Natural language processing,Artificial intelligence,Classifier (linguistics),Text normalization	Conference	P11-1
Citations	PageRank	References
190	9.07	20
Authors
2

Search Limit

100190

Authors (2 rows)

Cited by (100 rows)

References (20 rows)

Name	Order	Citations	PageRank
Bo Han	1	593	29.85
Timothy Baldwin	2	452	22.18

1