TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names - Citegraph

Paper Info

Title
TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names

Abstract
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy. Most existing HMTC methods train classifiers using massive human-labeled documents, which are often too costly to obtain in real-world applications. In this paper, we explore to conduct HMTC based on only class surface names as supervision signals. We observe that to perform HMTC, human experts typically first pinpoint a few most essential classes for the document as its “core classes”, and then check core classes’ ancestor classes to ensure the coverage. To mimic human experts, we propose a novel HMTC framework, named TaxoClass. Specifically, TaxoClass (1) calculates document-class similarities using a textual entailment model, (2) identifies a document’s core classes and utilizes confident core classes to train a taxonomy-enhanced classifier, and (3) generalizes the classifier via multi-label self-training. Our experiments on two challenging datasets show TaxoClass can achieve around 0.71 Example-F1 using only class names, outperforming the best previous method by 25%.

Year	Venue	DocType
2021	NAACL-HLT	Conference
Citations	PageRank	References
0	0.34	0
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jiaming Shen	1	23	3.21
Wenda Qiu	2	0	0.34
Yu Meng	3	49	11.09
Jingbo Shang	4	88	15.88
Xiang Ren	5	885	60.08
Jiawei Han	6	43085	3824.48

1