Title
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
Abstract
Online social networks allow powerless people to gain enormous amounts of control over particular people’s lives and profit from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great efforts have recently been made to identify misogyny, it is still difficult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not sufficient. Moreover, as Spanish is spoken worldwide, context and cultural differences can complicate this identification. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classified it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classification based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identification of misogyny. We have evaluated our proposal with three machine-learning classifiers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Year
DOI
Venue
2021
10.1016/j.future.2020.08.032
Future Generation Computer Systems
Keywords
DocType
Volume
Misogyny detection,Text classification,Natural language processing,Machine-learning
Journal
114
ISSN
Citations 
PageRank 
0167-739X
1
0.39
References 
Authors
0
4