Title
Explaining a bag of words with hierarchical conceptual labels
Abstract
In natural language processing and information retrieval tasks, the bag-of-words model is widely used to represent the semantics of texts. However, it is difficult for machines to sufficiently understand a bag of words as well as the corresponding text without explicit semantic explanation, thus hindering the power of the bag-of-words model in many scenarios. In this paper, we introduce the task of hierarchical conceptual labeling (HCL), which aims to generate a set of conceptual labels with a hierarchy to explicitly explain the semantics of a bag of words, where the candidate labels are selected from a large-scale knowledge base, i.e., Microsoft Concept Graph. To this end, we first propose a denoising algorithm to filter out the noise in a bag of words in advance. Then the hierarchical conceptual labels are generated for the clean bag of words based on a hierarchical clustering algorithm, i.e., Bayesian rose trees. We conduct extensive experiments and prove that (1) the proposed denoising algorithm can effectively delete the noise words from a bag of words, (2) the Bayesian rose trees based algorithm can generate hierarchical conceptual labels for a bag of words with a high accuracy.
Year
DOI
Venue
2020
10.1007/s11280-019-00752-3
World Wide Web
Keywords
DocType
Volume
Hierarchical conceptual labeling, Microsoft Concept Graph, Bayesian rose trees, Hierarchical clustering
Journal
23
Issue
ISSN
Citations 
3
1386-145X
1
PageRank 
References 
Authors
0.36
0
3
Name
Order
Citations
PageRank
Haiyun Jiang132.76
Yanghua Xiao248254.90
Wei Wang37122746.33