A Study of Methods for the Generation of Domain-Aware Word Embeddings - Citegraph

Paper Info

Title
A Study of Methods for the Generation of Domain-Aware Word Embeddings

Abstract
Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.

Year	DOI	Venue
2020	10.1145/3397271.3401287	SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020
DocType	ISBN	Citations
Conference	978-1-4503-8016-4	0
PageRank	References	Authors
0.34	0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Dominic Seyler	1	9	4.29
ChengXiang Zhai	2	11908	649.74

1