Title
PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese
Abstract
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraphrased sentences. The quality of this new corpus is manually examined. Our new model is language-independent, meaning that such paraphrase corpora for other languages can be built in the same way.
Year
DOI
Venue
2019
10.1007/978-3-030-32233-5_63
Lecture Notes in Artificial Intelligence
Keywords
DocType
Volume
Paraphrase,Paraphrase extraction,Sentence embedding,Sentence similarity
Conference
11838
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Bowei Zhang120.71
Weiwei Sun200.34
Xiaojun Wan31685125.70
Zongming Guo477881.98