Abstract | ||
---|---|---|
Bilingual word embedding, which maps word embedding of two languages into one vector space, has been widely applied in the domain of machine translation, word sense disambiguation and so on. However, no model has been universally accepted for learning bilingual word embedding. In this work, we propose a novel model named CJ-BOC to learn Chinese-Japanese word embeddings. Given Chinese and Japanese share a large portion of common characters, we exploit them in our training process. We demonstrated the effectiveness of such exploitation through theoretical and also experimental study. To evaluate the performance of CJ-BOC, we conducted a comprehensive experiment, which reveals its speed advantage, and high quality of acquired word embeddings as well. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-47650-6_7 | Lecture Notes in Artificial Intelligence |
Keywords | Field | DocType |
Bilingual word embedding,Distributed representation,Common characters,Chinese-Japanese | Vector space,Computer science,Machine translation,Exploit,Speech recognition,Natural language processing,Artificial intelligence,Word embedding,Distributed representation,Word-sense disambiguation | Conference |
Volume | ISSN | Citations |
9983 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 1 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wang Jilei | 1 | 0 | 0.34 |
Shiying Luo | 2 | 5 | 3.44 |
Li Yanning | 3 | 0 | 0.34 |
Xia Shu-Tao | 4 | 342 | 75.29 |