Title
Seeds: Sampling-Enhanced Embeddings
Abstract
Finding a desirable sampling estimator has a profound impact on the development of static word embedding models, such as continue-bag-of-words (CBOW) and skip gram (SG), which have been generally accepted as popular low-resource algorithms to generate task-agnostic word representations. Due to the prevalence of large-scale pretrained models, less attention has been paid to these static models in the recent years. However, compared with the dynamic embedding models (e.g., BERT), these static models are straightforward to interpret, cost effective to train, and out-of-box to deploy, thus are still widely used in various downstream models until now. Therefore, it is still of considerable significance to study and improve them, especially the crucial components shared by these static models. In this article, we focus on negative sampling (NS), a key component shared by the sampling-based static models, by investigating and mitigating some critical problems of the sampling core. Concretely, we propose Seeds, a sampling enhanced embedding framework, to learn static word embeddings by a new algorithmic innovation for replacing the NS estimator, in which multifactor global priors are considered dynamically for different training pairs. Then, we implement this framework by four concrete models. For the first two implementations, namely CBOW-GP and SG-GP, both negative words and positive auxiliaries are sampled. And for the other two implementations, CBOW-GN and SG-GN, estimations are simplified by sampling only the negative instances. Extensive experimental results across a variety of standard intrinsic and extrinsic tasks demonstrate that embeddings learned by the proposed models outperform their NS-based counterparts, such as CBOW-NS and SG-NS, as well as other strong baselines.
Year
DOI
Venue
2022
10.1109/TNNLS.2020.3028099
IEEE Transactions on Neural Networks and Learning Systems
Keywords
DocType
Volume
Natural language processing,negative sampling (NS),word embedding,word representation
Journal
33
Issue
ISSN
Citations 
2
2162-237X
0
PageRank 
References 
Authors
0.34
7
3
Name
Order
Citations
PageRank
Ning Gong100.68
Nianmin Yao215921.57
Shun Guo300.34