Title | ||
---|---|---|
Automatic ranking of swear words using word embeddings and pseudo-relevance feedback. |
Abstract | ||
---|---|---|
This paper describes a method for automatically ranking a dictionary of swear words based on their level of rudeness. The final ranking is generated by combining two baseline rankings: 1) using the normalized accumulated cosine similarity between the word embeddings of the swear word and the n-best list of closest neighborhoods, and 2) using a pseudo-relevance feedback and bootstrapping algorithm. The proposed methods are trained using dialogues extracted from movies scripts and evaluated against a list of swear words ranked manually in 5 categories by four different annotators. The Spearman correlation coefficient between the rankings generated by the proposed system and a consolidated gold standard reaches a similar value to the ones obtained among the different human annotators, proving that the proposed method is a good alternative to the manual process. |
Year | Venue | Field |
---|---|---|
2015 | Asia-Pacific Signal and Information Processing Association Annual Summit and Conference | Normalization (statistics),Relevance feedback,Ranking,Cosine similarity,Bootstrapping,Computer science,Speech recognition,Natural language processing,Encyclopedia,Artificial intelligence,Spearman's rank correlation coefficient,Scripting language |
DocType | ISSN | Citations |
Conference | 2309-9402 | 0 |
PageRank | References | Authors |
0.34 | 13 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Luis Fernando D'Haro | 1 | 181 | 25.97 |
Rafael E. Banchs | 2 | 566 | 63.64 |