Title
On the impact of knowledge-based linguistic annotations in the quality of scientific embeddings
Abstract
In essence, embedding algorithms work by optimizing the distance between a word and its usual context in order to generate an embedding space that encodes the distributional representation of words. In addition to single words or word pieces, other features which result from the linguistic analysis of text, including lexical, grammatical and semantic information, can be used to improve the quality of embedding spaces. However, until now we did not have a precise understanding of the impact that such individual annotations and their possible combinations may have in the quality of the embeddings. In this paper, we conduct a comprehensive study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus and quantify their impact in the resulting representations. Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task. In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.
Year
DOI
Venue
2021
10.1016/j.future.2021.02.019
Future Generation Computer Systems
Keywords
DocType
Volume
Natural language processing,Linguistic analysis,Knowledge graphs,Embeddings
Journal
120
ISSN
Citations 
PageRank 
0167-739X
1
0.35
References 
Authors
0
3
Name
Order
Citations
PageRank
Andres Garcia-Silva110.35
Ronald Denaux215314.39
José Manuél Gómez-Pérez396.77