Title
Improving Word Embedding Compositionality using Lexicographic Definitions.
Abstract
We present an in-depth analysis of four popular word embeddings (Word2Vec, GloVe, fastText and Paragram) in terms of their semantic compositionality. In addition, we propose a method to tune these embeddings towards better compositionality. We find that training the existing embeddings to compose lexicographic definitions improves their performance in this task significantly, while also getting similar or better performance in both word similarity and sentence embedding evaluations. Our method tunes word embeddings using a simple neural network architecture with definitions and lemmas from WordNet. Since dictionary definitions are semantically similar to their associated lemmas, they are the ideal candidate for our tuning method, as well as evaluating for compositionality. Our architecture allows for the embeddings to be composed using simple arithmetic operations, which makes these embeddings specifically suitable for production applications such as web search and data mining. We also explore more elaborate and involved compositional models. In our analysis, we evaluate original embeddings, as well as tuned embeddings, using existing word similarity and sentence embedding evaluation methods. Aside from these evaluation methods used in related work, we also evaluate embeddings using a ranking method which tests composed vectors using the lexicographic definitions already mentioned. In contrast to other evaluation methods, ours is not invariant to the magnitude of the embedding vector, which we show is important for composition. We consider this new evaluation method, called CompVecEval, to be a key contribution.
Year
DOI
Venue
2018
10.1145/3178876.3186007
WWW '18: The Web Conference 2018 Lyon France April, 2018
Field
DocType
ISBN
Principle of compositionality,Embedding,Computer science,Distributional semantics,Natural language processing,Artificial intelligence,Word2vec,Word embedding,WordNet,Sentence,Machine learning,Feature learning
Conference
978-1-4503-5639-8
Citations 
PageRank 
References 
1
0.38
34
Authors
3
Name
Order
Citations
PageRank
Thijs Scheepers110.38
Evangelos Kanoulas295671.55
efstratios gavves365533.41