Title
Detection Of Opinion Spam With Character N-Grams
Abstract
In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character n-grams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the effectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2.3% and 2.1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Naive Bayes classifier showed F-1 values up to 0.80 for both opinion polarities.
Year
DOI
Venue
2015
10.1007/978-3-319-18117-2_21
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II
Keywords
DocType
Volume
Opinion spam, deceptive detection, character n-grams, word n-grams
Conference
9042
ISSN
Citations 
PageRank 
0302-9743
13
0.56
References 
Authors
17
4
Name
Order
Citations
PageRank
Donato Hernández Fusilier1462.55
Manuel Montes-Y-Gómez263883.97
paolo rosso31831188.74
Rafael Guzmán-Cabrera47812.63