Abstract | ||
---|---|---|
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraphrased sentences. The quality of this new corpus is manually examined. Our new model is language-independent, meaning that such paraphrase corpora for other languages can be built in the same way. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1007/978-3-030-32233-5_63 | Lecture Notes in Artificial Intelligence |
Keywords | DocType | Volume |
Paraphrase,Paraphrase extraction,Sentence embedding,Sentence similarity | Conference | 11838 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bowei Zhang | 1 | 2 | 0.71 |
Weiwei Sun | 2 | 0 | 0.34 |
Xiaojun Wan | 3 | 1685 | 125.70 |
Zongming Guo | 4 | 778 | 81.98 |