Detection Of Opinion Spam With Character N-Grams - Citegraph

Paper Info

Title
Detection Of Opinion Spam With Character N-Grams

Abstract
In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character n-grams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the effectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2.3% and 2.1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Naive Bayes classifier showed F-1 values up to 0.80 for both opinion polarities.

Year	DOI	Venue
2015	10.1007/978-3-319-18117-2_21	COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II
Keywords	DocType	Volume
Opinion spam, deceptive detection, character n-grams, word n-grams	Conference	9042
ISSN	Citations	PageRank
0302-9743	13	0.56
References	Authors
17	4

Authors (4 rows)

Cited by (13 rows)

References (17 rows)

Name	Order	Citations	PageRank
Donato Hernández Fusilier	1	46	2.55
Manuel Montes-Y-Gómez	2	638	83.97
paolo rosso	3	1831	188.74
Rafael Guzmán-Cabrera	4	78	12.63

1