Title
Deep analysis of word sense disambiguation via semi-supervised learning and neural word representations
Abstract
AbstractHighlights •Ambiguity is a challenging task in text mining addressed for word-sense disambiguation algorithms.•The lack of labeled dataset is a barrier and semi-supervised learning (SSL) addresses this problem.•Our method explored SSL and word-embeddings CBOW, SKIP-GRAM, FASTTEXT, GLOVE, BERT and ELECTRA.•The F1-score increases in several datasets like Senseval-2, Senseval-3, Semeval-2007 and Semcor. AbstractWord Sense Disambiguation (WSD) aims to determine the meaning of a word in context. Different approaches have been proposed in supervised and unsupervised domains. In most cases, supervised learning provides superior WSD performance. Since sense-annotated corpora can be difficult or time-consuming to obtain, which must be repeated for new domains, languages, and sense inventories, semi-supervised learning (SSL) methods, that combine a small amount of sense-annotated data, start to be pre-eminent. In SSL, graph-based methods are common, because they capture the relationships between terms using an undirected graph. This paper aims to investigate semi-supervised WSD by considering different graph-based SSL algorithms with features generated by word embeddings from Word2Vec, FastText, GloVe, BERT and ELECTRA models combined with parts-of-speech tags and word context. We test several combinations of word-embedding models, similarity measures for graph construction and SSL classification algorithms to disambiguate classical lexical sample WSD datasets. The results indicate our SSL algorithms achieved competitive results compared to supervised ones and the ELECTRA models performed better than other embeddings for SSL.
Year
DOI
Venue
2021
10.1016/j.ins.2021.04.006
Periodicals
Keywords
DocType
Volume
Natural language processing, Word sense disambiguation, Semi-supervised learning, Word embeddings, Graph-based methods
Journal
570
Issue
ISSN
Citations 
C
0020-0255
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
José Marcio Duarte100.34
Samuel Sousa200.34
Evangelos Milios33073360.46
Lilian Berton4167.82