Incorporating Domain Knowledge In Learning Word Embedding - Citegraph

Paper Info

Title
Incorporating Domain Knowledge In Learning Word Embedding

Abstract
Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a variety of NLP tasks such as named entity recognition, syntactic parsing and sentiment analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this paper, we describe a novel method, called Annotation Word Embedding (AWE), to train domain-specific word embeddings from sparse texts. Our method is generic and can leverage diverse types of domain knowledge such as domain vocabulary, semantic relations and attribute specifications. Specifically, our method encodes diverse types of domain knowledge as text annotations and incorporates the annotations in word embedding. We have evaluated AWE in two cybersecurity applications: identifying malware aliases and identifying relevant Common Vulnerabilities and Exposures (CVEs). Our evaluation results have demonstrated the effectiveness of our method over state-of-the-art baselines.

Year	DOI	Venue
2019	10.1109/ICTAI.2019.00226	2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019)
Keywords	Field	DocType
Word embedding, Domain knowledge, Cyber security	Embedding,Domain knowledge,Sentiment analysis,Computer science,Text corpus,Artificial intelligence,Natural language processing,Word2vec,Word embedding,Named-entity recognition,Vocabulary,Machine learning	Conference
ISSN	Citations	PageRank
1082-3409	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
arpita roy	1	14	4.39
Youngja Park	2	219	24.84
Shimei Pan	3	684	64.41

1