Abstract | ||
---|---|---|
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy. Most existing HMTC methods train classifiers using massive human-labeled documents, which are often too costly to obtain in real-world applications. In this paper, we explore to conduct HMTC based on only class surface names as supervision signals. We observe that to perform HMTC, human experts typically first pinpoint a few most essential classes for the document as its “core classes”, and then check core classes’ ancestor classes to ensure the coverage. To mimic human experts, we propose a novel HMTC framework, named TaxoClass. Specifically, TaxoClass (1) calculates document-class similarities using a textual entailment model, (2) identifies a document’s core classes and utilizes confident core classes to train a taxonomy-enhanced classifier, and (3) generalizes the classifier via multi-label self-training. Our experiments on two challenging datasets show TaxoClass can achieve around 0.71 Example-F1 using only class names, outperforming the best previous method by 25%. |
Year | Venue | DocType |
---|---|---|
2021 | NAACL-HLT | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiaming Shen | 1 | 23 | 3.21 |
Wenda Qiu | 2 | 0 | 0.34 |
Yu Meng | 3 | 49 | 11.09 |
Jingbo Shang | 4 | 88 | 15.88 |
Xiang Ren | 5 | 885 | 60.08 |
Jiawei Han | 6 | 43085 | 3824.48 |