Title | ||
---|---|---|
A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors |
Abstract | ||
---|---|---|
Distributional semantic models represent the meaning of words as vectors. We introduce a selection method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset, which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main advantage of the method compared to fusion methods such as NMF, and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We report tests results on word similarity tasks for MEN, RG-65, SimLex-999, and WordSim353 gold datasets. Also, results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So, we achieve good interpretability without a large penalty. Interpretability evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models, and the baseline method. We report the top 15 words of 1500 selected basis words in this paper. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1613/jair.1.13353 | JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH |
DocType | Volume | Issue |
Journal | 72 | 1 |
ISSN | Citations | PageRank |
1076-9757 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Atefe Pakzad | 1 | 0 | 0.34 |
Morteza Analoui | 2 | 124 | 24.94 |