Title
Understanding a bag of words by conceptual labeling with prior weights
Abstract
In many natural language processing tasks, e.g., text classification or information extraction, the weighted bag-of-words model is widely used to represent the semantics of text, where the importance of each word is quantified by its weight. However, it is still difficult for machines to understand a weighted bag of words (WBoW) without explicit explanations, which seriously limits its application in downstream tasks. To make a machine better understand a WBoW, we introduce the task of conceptual labeling, which aims at generating the minimum number of concepts as labels to explicitly represent and explain the semantics of a WBoW. Specifically, we first propose three principles for label generation and then model each principle as an objective function. To satisfy the three principles simultaneously, a multi-objective optimization problem is solved. In our framework, a taxonomy (i.e., Microsoft Concept Graph) is used to provide high-quality candidate concepts, and a corresponding search algorithm is proposed to derive the optimal solution (i.e., a small set of proper concepts as labels). Furthermore, two pruning strategies are also proposed to reduce the search space and improve the performance. Our experiments and results prove that the proposed method is capable of generating proper labels for WBoWs. Besides, we also apply the generated labels to the task of text classification and observe an increase in performance, which further justifies the effectiveness of our conceptual labeling framework.
Year
DOI
Venue
2020
10.1007/s11280-020-00806-x
World Wide Web
Keywords
DocType
Volume
Conceptual labeling, Microsoft concept graph, Weighted bag of words, Multi-objective optimization, Concept pruning
Journal
23
Issue
ISSN
Citations 
4
1386-145X
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Haiyun Jiang132.76
Deqing Yang2299.69
Yanghua Xiao348254.90
Wei Wang47122746.33