Abstract | ||
---|---|---|
Levy, S{o}gaard and Goldbergu0027s (2017) S-ID (sentence ID) method applies word2vec on tuples containing a sentence ID and a word from the sentence. It has been shown to be a strong baseline for learning multilingual embeddings. Inspired by recent work on concept based embedding learning we propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. We perform experiments on the Parallel Bible Corpus across 1000+ languages and show that SC-ID yields up to 6% performance increase in a word translation task. In addition, we provide evidence that SC-ID is easily and widely applicable by reporting competitive results across 8 tasks on a EuroParl based corpus. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Computation and Language | Journal |
Volume | Citations | PageRank |
abs/1811.00586 | 1 | 0.35 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Philipp Dufter | 1 | 1 | 4.74 |
Hinrich Schütze | 2 | 2113 | 362.21 |