Abstract | ||
---|---|---|
In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character n-grams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the effectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2.3% and 2.1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Naive Bayes classifier showed F-1 values up to 0.80 for both opinion polarities. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-18117-2_21 | COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II |
Keywords | DocType | Volume |
Opinion spam, deceptive detection, character n-grams, word n-grams | Conference | 9042 |
ISSN | Citations | PageRank |
0302-9743 | 13 | 0.56 |
References | Authors | |
17 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Donato Hernández Fusilier | 1 | 46 | 2.55 |
Manuel Montes-Y-Gómez | 2 | 638 | 83.97 |
paolo rosso | 3 | 1831 | 188.74 |
Rafael Guzmán-Cabrera | 4 | 78 | 12.63 |