Title
Weakly supervised cyberbullying detection with participant-vocabulary consistency.
Abstract
Online harassment and cyberbullying are becoming serious social health threats damaging people’s lives. This phenomenon is creating a need for automated, data-driven techniques for analyzing and detecting such detrimental online behaviors. We propose a weakly supervised machine learning method for simultaneously inferring user roles in harassment-based bullying and new vocabulary indicators of bullying. The learning algorithm considers social structure and infers which users tend to bully and which tend to be victimized. To address the elusive nature of cyberbullying using minimal effort and cost, the learning algorithm only requires weak supervision. The weak supervision is in the form of expert-provided small seed of bullying indicators, and the algorithm uses a large, unlabeled corpus of social media interactions to extract bullying roles of users and additional vocabulary indicators of bullying. The model estimates whether each social interaction is bullying based on who participates and based on what language is used, and it tries to maximize the agreement between these estimates, i.e., participant-vocabulary consistency (PVC). To evaluate PVC, we perform extensive quantitative and qualitative experiments on three social media datasets: Twitter, Ask.fm, and Instagram. We illustrate the strengths and weaknesses of the model by analyzing the identified conversations and key phrases by PVC. In addition, we demonstrate the distributions of bully and victim scores to examine the relationship between the tendencies of users to bully or to be victimized. We also perform fairness evaluation to analyze the potential for automated detection to be biased against particular groups.
Year
DOI
Venue
2018
10.1007/s13278-018-0517-y
Social Netw. Analys. Mining
Field
DocType
Volume
Social relation,Internet privacy,Social media,Online harassment,Psychology,Phenomenon,Strengths and weaknesses,Vocabulary,Social determinants of health,Harassment
Journal
8
Issue
ISSN
Citations 
1
1869-5450
0
PageRank 
References 
Authors
0.34
27
2
Name
Order
Citations
PageRank
Elaheh Raisi1153.73
Bert Huang256339.09