Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings - Citegraph

Paper Info

Title
Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings

Abstract
The Word2Vec model is a neural network-based unsupervised word embedding technique widely used in applications such as natural language processing, bioinformatics and graph mining. As Word2Vec repeatedly performs Stochastic Gradient Descent (SGD) to minimize the objective function, it is very compute-intensive. However, existing methods for parallelizing Word2Vec are not optimized enough for data locality to achieve high performance. In this paper, we develop a parallel data-locality-enhanced Word2Vec algorithm based on Skip-gram with a novel negative sampling method that decouples loss calculation with positive and negative samples; this allows us to efficiently reformulate matrix-matrix operations for the negative samples over the sentence. Experimental results demonstrate our parallel implementations on multi-core CPUs and GPUs achieve significant performance improvement over the existing state-of-the-art parallel Word2Vec implementations while maintaining evaluation quality. We also show the utility of our Word2Vec implementation within the Node2Vec algorithm which accelerates embedding learning for large graphs.

Year	DOI	Venue
2019	10.1109/MLHPC49564.2019.00010	2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)
Keywords	Field	DocType
performance improvement,GPU,multicore CPU,matrix-matrix operations,Skip-gram,stochastic gradient descent method,graph embeddings,Node2Vec algorithm,Word2Vec implementation,negative sampling method,parallel data-locality-enhanced Word2Vec algorithm,neural network-based unsupervised word embedding technique,Word2Vec model,Word2Vec embeddings,parallel data-local training	Locality,Stochastic gradient descent,Embedding,Computer science,Parallel computing,Sampling (statistics),Word2vec,Word embedding,Artificial neural network,Performance improvement	Conference
ISBN	Citations	PageRank
978-1-7281-5986-7	0	0.34
References	Authors
13	6

Authors (6 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Gordon Moon	1	2	1.78
Denis Newman-Griffis	2	2	2.08
Jinsung Kim	3	0	0.34
Aravind Sukumaran-Rajam	4	50	12.03
Eric Fosler-Lussier	5	125	12.50
Ponnuswamy Sadayappan	6	170	14.46

1