Abstract | ||
---|---|---|
We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods. |
Year | Venue | Field |
---|---|---|
2016 | arXiv: Computation and Language | Embedding,Computer science,Natural language processing,Artificial intelligence,Parsing,Text categorization,Machine learning |
DocType | Volume | Citations |
Journal | abs/1602.01925 | 45 |
PageRank | References | Authors |
1.33 | 24 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Waleed Ammar | 1 | 339 | 18.48 |
George Mulcaire | 2 | 89 | 3.42 |
Yulia Tsvetkov | 3 | 347 | 33.83 |
Guillaume Lample | 4 | 651 | 22.75 |
chris dyer | 5 | 5438 | 232.28 |
Noah A. Smith | 6 | 5867 | 314.27 |