Title
A Study of Methods for the Generation of Domain-Aware Word Embeddings
Abstract
Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.
Year
DOI
Venue
2020
10.1145/3397271.3401287
SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-8016-4
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Dominic Seyler194.29
ChengXiang Zhai211908649.74