Title
Sentiment-based Classification of Radical Text on the Web.
Abstract
The total number of webpages has grown substantially since the birth of the Internet. So too have the number of webpages dedicated to radical yet subtle content. As these new circumstances have necessitated a guided data collection method, one that can sidestep the laborious manual methods that have been classically utilized, simple keyword analysis has not been sufficient to identify radical sites on Web 1.0 pro-extremist, anti-extremist, and news sites, for example, may use the same keywords to discuss the same event but have a very different motivation. In an effort to explore this problem, we completed an exercise involving the use of a web-crawler to collect 20,000 webpages from five sentiment-based classes to assess their differences: (1) radical Right sites; (2) radical Islamic sites; (3) anti-extremist sites; (4) news source sites discussing extremism; and (5) sites that did not discuss extremism. Parts-of-Speech (POS) tagging was used to identify 198 of the most frequent keywords within the data, and the sentiment value for each of these keywords was calculated for each webpage using sentiment analysis. With these values, a decision tree was applied to three classification models. Results suggest that radical Islamic text can be classified at a much higher rate of success than radical Right text.
Year
DOI
Venue
2016
10.1109/EISIC.2016.41
European Intelligence and Security Informatics Conference
Keywords
Field
DocType
Sentiment Analysis,Decision Trees,Extremism
Data modeling,Data collection,Decision tree,World Wide Web,Web page,Sentiment analysis,Computer science,Radical right,The Internet
Conference
ISSN
Citations 
PageRank 
2572-3723
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Ryan Scrivens120.74
Richard Frank2205.61