Detecting offensive tweets via topical feature discovery over a large scale twitter corpus - Citegraph

Paper Info

Title
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Abstract
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.

Year	DOI	Venue
2012	10.1145/2396761.2398556	CIKM
Keywords	Field	DocType
topical feature discovery,huge twitter corpus,novel semi-supervised approach,detects offensive tweet,logistic regression,machine learning,large scale twitter corpus,true positive rate,profanity-related offensive content,false positive rate,large scale hand annotation,linguistic regularity,topic modeling	Data mining,False positive rate,Computer science,Artificial intelligence,Annotation,Information retrieval,Exploit,Supervised learning,Topic model,Feature discovery,True positive rate,Machine learning,Offensive	Conference
Citations	PageRank	References
40	2.16	10
Authors
5

Authors (5 rows)

Cited by (40 rows)

References (10 rows)

Name	Order	Citations	PageRank
Guang Xiang	1	382	18.31
Bin Fan	2	792	43.26
Ling Wang	3	884	52.37
Jason Hong	4	6706	518.75
Rosé Carolyn	5	2126	222.80

1