Title
Feature Selection Using Sampling with Replacement, Covering Arrays and Rule-Induction Techniques to Aid Polarity Detection in Twitter Sentiment Analysis.
Abstract
One of the main tasks in analyzing sentiment on Twitter is polarity detection - i.e. the classification of 'tweets' in terms of feelings, opinions and attitudes expressed. Polarity detection on Twitter by means of machine learning methods is generally affected by the use of irrelevant, redundant, noisy or correlated features, especially when a high-dimensional representation is used in the feature set. There is thus a need for a selection method that removes those features that render the classification algorithm inefficient. In this work, we propose a feature selection method based on the concept of bagging, with two important modifications: (i) the use of covering arrays to support the process of building bootstrap samples; and (ii) the use of the results of rule-induction techniques (JRIP, C4.5, CART or others) to generate the reduced representation of tweets with the features selected. The experimental results show that on using the method proposed, we obtain similar or better results than those obtained with the original representation (this comprising a set of 91 features used in research related to polarity detection in Twitter), bringing the possibility of simpler and faster process models. A subset of features is thereby identified that can facilitate improvements in future polarity detection proposals on Twitter.
Year
DOI
Venue
2018
10.1007/978-3-030-03928-8_38
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2018
Keywords
Field
DocType
Sentiment analysis,Polarity detection,Covering arrays,Feature selection,Twitter
Simple random sample,Pattern recognition,Feature selection,Sentiment analysis,Computer science,Process modeling,Feature set,Rule induction,Artificial intelligence,Bootstrapping (electronics)
Conference
Volume
ISSN
Citations 
11238
0302-9743
0
PageRank 
References 
Authors
0.34
10
4
Name
Order
Citations
PageRank
Jorge Villegas100.34
Carlos Cobos2443.44
Martha Mendoza321.38
Enrique Herrera-Viedma413105642.24