Lexical normalization for social media text - Citegraph

Paper Info

Title
Lexical normalization for social media text

Abstract
Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this article, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalizing lexical variants. Our method uses a classifier to detect lexical variants, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.

Year	DOI	Venue
2013	10.1145/2414425.2414430	ACM TIST
Keywords	Field	DocType
word similarity,out-of-vocabulary word,normalizing lexical variant,lexical variant,morphophonemic similarity,lexical normalization,sms corpus,correction candidate,social media text,large volume,probable correction candidate,text analysis,microblog	Text mining,Normalization (statistics),Social media,Computer science,Microblogging,Speech recognition,Morphophonology,Natural language processing,Artificial intelligence,Classifier (linguistics)	Journal
Volume	Issue	ISSN
4	1	2157-6904
Citations	PageRank	References
56	2.09	38
Authors
3

Authors (3 rows)

Cited by (56 rows)

References (38 rows)

Name	Order	Citations	PageRank
Bo Han	1	593	29.85
Paul Cook	2	345	14.35
Timothy Baldwin	3	426	20.64

1