Abstract | ||
---|---|---|
In natural language processing and information retrieval tasks, the bag-of-words model is widely used to represent the semantics of texts. However, it is difficult for machines to sufficiently understand a bag of words as well as the corresponding text without explicit semantic explanation, thus hindering the power of the bag-of-words model in many scenarios. In this paper, we introduce the task of hierarchical conceptual labeling (HCL), which aims to generate a set of conceptual labels with a hierarchy to explicitly explain the semantics of a bag of words, where the candidate labels are selected from a large-scale knowledge base, i.e., Microsoft Concept Graph. To this end, we first propose a denoising algorithm to filter out the noise in a bag of words in advance. Then the hierarchical conceptual labels are generated for the clean bag of words based on a hierarchical clustering algorithm, i.e., Bayesian rose trees. We conduct extensive experiments and prove that (1) the proposed denoising algorithm can effectively delete the noise words from a bag of words, (2) the Bayesian rose trees based algorithm can generate hierarchical conceptual labels for a bag of words with a high accuracy. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1007/s11280-019-00752-3 | World Wide Web |
Keywords | DocType | Volume |
Hierarchical conceptual labeling, Microsoft Concept Graph, Bayesian rose trees, Hierarchical clustering | Journal | 23 |
Issue | ISSN | Citations |
3 | 1386-145X | 1 |
PageRank | References | Authors |
0.36 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Haiyun Jiang | 1 | 3 | 2.76 |
Yanghua Xiao | 2 | 482 | 54.90 |
Wei Wang | 3 | 7122 | 746.33 |