Title | ||
---|---|---|
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus |
Abstract | ||
---|---|---|
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2396761.2398556 | CIKM |
Keywords | Field | DocType |
topical feature discovery,huge twitter corpus,novel semi-supervised approach,detects offensive tweet,logistic regression,machine learning,large scale twitter corpus,true positive rate,profanity-related offensive content,false positive rate,large scale hand annotation,linguistic regularity,topic modeling | Data mining,False positive rate,Computer science,Artificial intelligence,Annotation,Information retrieval,Exploit,Supervised learning,Topic model,Feature discovery,True positive rate,Machine learning,Offensive | Conference |
Citations | PageRank | References |
40 | 2.16 | 10 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guang Xiang | 1 | 382 | 18.31 |
Bin Fan | 2 | 792 | 43.26 |
Ling Wang | 3 | 884 | 52.37 |
Jason Hong | 4 | 6706 | 518.75 |
Rosé Carolyn | 5 | 2126 | 222.80 |