Title
On the Reproducibility and Generalisation of the Linear Transformation of Word Embeddings.
Abstract
Linear transformation is a way to learn a linear relationship between two word embeddings, such that words in the two different embedding spaces can be semantically related. In this paper, we examine the reproducibility and generalisation of the linear transformation of word embeddings. Linear transformation is particularly useful when translating word embedding models in different languages, since it can capture the semantic relationships between two models. We first reproduce two linear transformation approaches, a recent one using orthogonal transformation and the original one using simple matrix transformation. Previous findings on a machine translation task are re-examined, validating that linear transformation is indeed an effective way to transform word embedding models in different languages. In particular, we show that the orthogonal transformation can better relate the different embedding models. Following the verification of previous findings, we then study the generalisation of linear transformation in a multi-language Twitter election classification task. We observe that the orthogonal transformation outperforms the matrix transformation. In particular, it significantly outperforms the random classifier by at least 10% under the F1 metric across English and Spanish datasets. In addition, we also provide best practices when using linear transformation for multi-language Twitter election classification.
Year
DOI
Venue
2018
10.1007/978-3-319-76941-7_20
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)
Keywords
Field
DocType
Embedding,Linear transformation,Twitter classification
Data mining,Embedding,Orthogonal transformation,Computer science,Generalization,Machine translation,Theoretical computer science,Linear map,Word embedding,Transformation matrix,Classifier (linguistics)
Conference
Volume
ISSN
Citations 
10772
0302-9743
0
PageRank 
References 
Authors
0.34
13
5
Name
Order
Citations
PageRank
Xiao Yang1141.69
Iadh Ounis23438234.59
Richard Mccreadie340332.43
Craig Macdonald42588178.50
Anjie Fang5355.93