Out-of-Category Document Identification Using Target-Category Names as Weak Supervision - Citegraph

Paper Info

Title
Out-of-Category Document Identification Using Target-Category Names as Weak Supervision

Abstract
Identifying outlier documents, whose content is different from the majority of the documents in a corpus, has played an important role to manage a large text collection. However, due to the absence of explicit information about the inlier (or target) distribution, existing unsupervised outlier detectors are likely to make unreliable results depending on the density or diversity of the outliers in the corpus. To address this challenge, we introduce a new task referred to as out-of-category detection, which aims to distinguish the documents according to their semantic relevance to the inlier (or target) categories by using the category names as weak supervision. In practice, this task can be widely applicable in that it can flexibly designate the scope of target categories according to users' interests while requiring only the target-category names as minimum guidance. In this paper, we present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories. Our framework adopts a two-step approach, to take advantage of both (i) a discriminative text embedding and (ii) a neural text classifier. The experiments on real-world datasets demonstrate that our framework achieves the best detection performance among all baseline methods in various scenarios specifying different target categories.

Year	DOI	Venue
2021	10.1109/ICDM51629.2021.00041	2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021)
Keywords	DocType	ISSN
Text outlier detection, Out-of-category detection, Discriminative text embedding, Weakly supervised classification	Conference	1550-4786
Citations	PageRank	References
0	0.34	6
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Dongha Lee	1	14	6.77
Dongmin Hyun	2	0	0.34
Jiawei Han	3	43085	3824.48
Hwanjo Yu	4	1715	114.02

1