A Stronger Baseline for Multilingual Word Embeddings. - Citegraph

Paper Info

Title
A Stronger Baseline for Multilingual Word Embeddings.

Abstract
Levy, S{o}gaard and Goldbergu0027s (2017) S-ID (sentence ID) method applies word2vec on tuples containing a sentence ID and a word from the sentence. It has been shown to be a strong baseline for learning multilingual embeddings. Inspired by recent work on concept based embedding learning we propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. We perform experiments on the Parallel Bible Corpus across 1000+ languages and show that SC-ID yields up to 6% performance increase in a word translation task. In addition, we provide evidence that SC-ID is easily and widely applicable by reporting competitive results across 8 tasks on a EuroParl based corpus.

Year	Venue	DocType
2018	arXiv: Computation and Language	Journal
Volume	Citations	PageRank
abs/1811.00586	1	0.35
References	Authors
0	2

Authors (2 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Philipp Dufter	1	1	4.74
Hinrich Schütze	2	2113	362.21

1