Title
Sentiment Classification From Multi-Class Imbalanced Twitter Data Using Binarization
Abstract
Twitter became one of the most dynamically developing areas of social media. Due to concise nature of messages, rapid publication and high outreach, people share more and more of their opinions, thoughts and commentaries using this medium. Sentiment analysis is a specific subsection of natural language processing that concentrates on automatically categorizing opinions and attitudes expressed in a given portion of textual information. This requires dedicated machine learning solutions that are able to handle various difficulties embedded in the nature of data. In this paper, we present an efficient framework for automatic sentiment analysis from high-dimensional and sparse datasets that suffer from multi-class imbalance. We propose to approach it by applying a one-vs-one binary decomposition and reducing the dimensionality of each pairwise class set using Multiple Correspondence Analysis. Then we apply preprocessing to alleviate the skewed distributions in reduced number of dimensions. After that, on each pair of classes we train a binary classifier and combined them using a weighted multi-class reconstruction that promotes minority classes. The proposal is evaluated on a large Twitter dataset and obtained results are in favor of the proposed solution.
Year
DOI
Venue
2017
10.1007/978-3-319-59650-1_3
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017
Keywords
Field
DocType
Machine learning, Text mining, Sentiment analysis, Imbalanced learning, Multi-class imbalance
Pairwise comparison,Multiple correspondence analysis,Social media,Binary classification,Pattern recognition,Sentiment analysis,Computer science,Outreach,Curse of dimensionality,Preprocessor,Artificial intelligence,Machine learning
Conference
Volume
ISSN
Citations 
10334
0302-9743
2
PageRank 
References 
Authors
0.41
8
3
Name
Order
Citations
PageRank
Bartosz Krawczyk172160.97
Bridget T. McInnes228023.66
Alberto Cano313011.20