Title
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus
Abstract
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.
Year
DOI
Venue
2012
10.1145/2396761.2398556
CIKM
Keywords
Field
DocType
topical feature discovery,huge twitter corpus,novel semi-supervised approach,detects offensive tweet,logistic regression,machine learning,large scale twitter corpus,true positive rate,profanity-related offensive content,false positive rate,large scale hand annotation,linguistic regularity,topic modeling
Data mining,False positive rate,Computer science,Artificial intelligence,Annotation,Information retrieval,Exploit,Supervised learning,Topic model,Feature discovery,True positive rate,Machine learning,Offensive
Conference
Citations 
PageRank 
References 
40
2.16
10
Authors
5
Name
Order
Citations
PageRank
Guang Xiang138218.31
Bin Fan279243.26
Ling Wang388452.37
Jason Hong46706518.75
Rosé Carolyn52126222.80