Title | ||
---|---|---|
Feature Selection Using Sampling with Replacement, Covering Arrays and Rule-Induction Techniques to Aid Polarity Detection in Twitter Sentiment Analysis. |
Abstract | ||
---|---|---|
One of the main tasks in analyzing sentiment on Twitter is polarity detection - i.e. the classification of 'tweets' in terms of feelings, opinions and attitudes expressed. Polarity detection on Twitter by means of machine learning methods is generally affected by the use of irrelevant, redundant, noisy or correlated features, especially when a high-dimensional representation is used in the feature set. There is thus a need for a selection method that removes those features that render the classification algorithm inefficient. In this work, we propose a feature selection method based on the concept of bagging, with two important modifications: (i) the use of covering arrays to support the process of building bootstrap samples; and (ii) the use of the results of rule-induction techniques (JRIP, C4.5, CART or others) to generate the reduced representation of tweets with the features selected. The experimental results show that on using the method proposed, we obtain similar or better results than those obtained with the original representation (this comprising a set of 91 features used in research related to polarity detection in Twitter), bringing the possibility of simpler and faster process models. A subset of features is thereby identified that can facilitate improvements in future polarity detection proposals on Twitter. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-030-03928-8_38 | ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2018 |
Keywords | Field | DocType |
Sentiment analysis,Polarity detection,Covering arrays,Feature selection,Twitter | Simple random sample,Pattern recognition,Feature selection,Sentiment analysis,Computer science,Process modeling,Feature set,Rule induction,Artificial intelligence,Bootstrapping (electronics) | Conference |
Volume | ISSN | Citations |
11238 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jorge Villegas | 1 | 0 | 0.34 |
Carlos Cobos | 2 | 44 | 3.44 |
Martha Mendoza | 3 | 2 | 1.38 |
Enrique Herrera-Viedma | 4 | 13105 | 642.24 |