A Semantic Textual Similarity Calculation Model Based On Pre-Training Model - Citegraph

Paper Info

Title
A Semantic Textual Similarity Calculation Model Based On Pre-Training Model

Abstract
As a basic research topic in natural language processing, the calculation of text similarity is widely used in the fields of plagiarism checker and sentence search. The traditional calculation of text similarity constructed text vectors only based on TF-IDF, and used the cosine of the angle between vectors to measure the similarity between two texts. However, this method cannot solve the similar text detection task with different text representation but similar semantic representation. In response to the above-mentioned problems, we proposed the pre-training of text based on the ERNIE semantic model of Paddle-Hub, and constructed similar text detection into a classification problem; in view of the problem that most of the similar texts in the data set led to the imbalance of categories in the training set, an oversampling method for confusion sampling, OSConfusion, was proposed. The experimental results showed that the method proposed in this paper was able to solve the problem of paper comparison well, and could identify the repetitive paragraphs with different text representations. And the ERNIE-SIM with OSConfusion was better than the ERNIE-SIM without OSConfusion in the prediction process of similar document pairs in terms of precision and recall.

Year	DOI	Venue
2021	10.1007/978-3-030-82147-0_1	KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II
Keywords	DocType	Volume
Text similarity, Pre-training, Classification, Natural language processing, Deep learning	Conference	12816
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhaoyun Ding	1	29	5.90
Kai Liu	2	0	0.34
Wenhao Wang	3	5	9.95
Bin Liu	4	10	3.53

1