Title
Machine Learning Models for Paraphrase Identification and its Applications on Plagiarism Detection
Abstract
Paraphrase Identification or Natural Language Sentence Matching (NLSM) is one of the important and challenging tasks in Natural Language Processing where the task is to identify if a sentence is a paraphrase of another sentence in a given pair of sentences. Paraphrase of a sentence conveys the same meaning but its structure and the sequence of words varies. It is a challenging task as it is difficult to infer the proper context about a sentence given its short length. Also, coming up with similarity metrics for the inferred context of a pair of sentences is not straightforward as well. Whereas, its applications are numerous. This work explores various machine learning algorithms to model the task and also applies different input encoding scheme. Specifically, we created the models using Logistic Regression, Support Vector Machines, and different architectures of Neural Networks. Among the compared models, as expected, Recurrent Neural Network (RNN) is best suited for our paraphrase identification task. Also, we propose that Plagiarism detection is one of the areas where Paraphrase Identification can be effectively implemented.
Year
DOI
Venue
2019
10.1109/ICBK.2019.00021
2019 IEEE International Conference on Big Knowledge (ICBK)
Keywords
Field
DocType
Paraphrase Identification, Machine learning, Long Short Term Memory Networks, NLP
Plagiarism detection,Computer science,Support vector machine,Recurrent neural network,Paraphrase,Artificial intelligence,Natural language sentence,Artificial neural network,Sentence,Machine learning,Encoding (memory)
Conference
ISBN
Citations 
PageRank 
978-1-7281-4608-9
1
0.38
References 
Authors
0
13
Name
Order
Citations
PageRank
Ethan Hunt110.38
Binay Dahal210.38
Justin Zhan310.72
Laxmi Gewali47216.39
Paul Y. Oh528951.08
Ritvik Janamsetty610.38
Chanana Kinares710.38
Chanel Koh810.38
Alexis Sanchez910.38
Felix Zhan1010.38
Murat Özdemir1110.38
Shabnam Waseem1210.38
Osman Yolcu1310.38