Title
Automatic ranking of swear words using word embeddings and pseudo-relevance feedback.
Abstract
This paper describes a method for automatically ranking a dictionary of swear words based on their level of rudeness. The final ranking is generated by combining two baseline rankings: 1) using the normalized accumulated cosine similarity between the word embeddings of the swear word and the n-best list of closest neighborhoods, and 2) using a pseudo-relevance feedback and bootstrapping algorithm. The proposed methods are trained using dialogues extracted from movies scripts and evaluated against a list of swear words ranked manually in 5 categories by four different annotators. The Spearman correlation coefficient between the rankings generated by the proposed system and a consolidated gold standard reaches a similar value to the ones obtained among the different human annotators, proving that the proposed method is a good alternative to the manual process.
Year
Venue
Field
2015
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
Normalization (statistics),Relevance feedback,Ranking,Cosine similarity,Bootstrapping,Computer science,Speech recognition,Natural language processing,Encyclopedia,Artificial intelligence,Spearman's rank correlation coefficient,Scripting language
DocType
ISSN
Citations 
Conference
2309-9402
0
PageRank 
References 
Authors
0.34
13
2
Name
Order
Citations
PageRank
Luis Fernando D'Haro118125.97
Rafael E. Banchs256663.64