Mining Infrequent High-Quality Phrases from Domain-Specific Corpora - Citegraph

Paper Info

Title
Mining Infrequent High-Quality Phrases from Domain-Specific Corpora

Abstract
Phrase mining is a fundamental task for text analysis and has various downstream applications such as named entity recognition, topic modeling, and relation extraction. In this paper, we focus on mining high-quality phrases from domain-specific corpora with special consideration of infrequent ones. Previous methods might miss infrequent high-quality phrases in the candidate selection stage. And these methods rely on explicit features to mine phrases while rarely considering the implicit features. In addition, completeness is rarely explicitly considered in the evaluation of a high-quality phrase. In this paper, we propose a novel approach that exploits a sequence labeling model to capture infrequent phrases. And we employ implicit semantic features and contextual POS tag statistics to measure meaningfulness and completeness, respectively. Experiments over four real-world corpora demonstrate that our method achieves significant improvements over previous state-of-the-art methods across different domains and languages.

Year	DOI	Venue
2020	10.1145/3340531.3412029	CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020
DocType	ISBN	Citations
Conference	978-1-4503-6859-9	0
PageRank	References	Authors
0.34	23	8

Authors (8 rows)

Cited by (0 rows)

References (23 rows)

Name	Order	Citations	PageRank
Li Wang	1	38	15.46
Wei Zhu	2	0	2.37
Sihang Jiang	3	0	1.69
Sheng Zhang	4	0	0.34
KeQiang Wang	5	9	3.77
Yuan Ni	6	11	4.61
Guotong Xie	7	1	7.51
Yanghua Xiao	8	482	54.90

1