Title
TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names
Abstract
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy. Most existing HMTC methods train classifiers using massive human-labeled documents, which are often too costly to obtain in real-world applications. In this paper, we explore to conduct HMTC based on only class surface names as supervision signals. We observe that to perform HMTC, human experts typically first pinpoint a few most essential classes for the document as its “core classes”, and then check core classes’ ancestor classes to ensure the coverage. To mimic human experts, we propose a novel HMTC framework, named TaxoClass. Specifically, TaxoClass (1) calculates document-class similarities using a textual entailment model, (2) identifies a document’s core classes and utilizes confident core classes to train a taxonomy-enhanced classifier, and (3) generalizes the classifier via multi-label self-training. Our experiments on two challenging datasets show TaxoClass can achieve around 0.71 Example-F1 using only class names, outperforming the best previous method by 25%.
Year
Venue
DocType
2021
NAACL-HLT
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Jiaming Shen1233.21
Wenda Qiu200.34
Yu Meng34911.09
Jingbo Shang48815.88
Xiang Ren588560.08
Jiawei Han6430853824.48