Abstract | ||
---|---|---|
Twitter became one of the most dynamically developing areas of social media. Due to concise nature of messages, rapid publication and high outreach, people share more and more of their opinions, thoughts and commentaries using this medium. Sentiment analysis is a specific subsection of natural language processing that concentrates on automatically categorizing opinions and attitudes expressed in a given portion of textual information. This requires dedicated machine learning solutions that are able to handle various difficulties embedded in the nature of data. In this paper, we present an efficient framework for automatic sentiment analysis from high-dimensional and sparse datasets that suffer from multi-class imbalance. We propose to approach it by applying a one-vs-one binary decomposition and reducing the dimensionality of each pairwise class set using Multiple Correspondence Analysis. Then we apply preprocessing to alleviate the skewed distributions in reduced number of dimensions. After that, on each pair of classes we train a binary classifier and combined them using a weighted multi-class reconstruction that promotes minority classes. The proposal is evaluated on a large Twitter dataset and obtained results are in favor of the proposed solution. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-59650-1_3 | HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017 |
Keywords | Field | DocType |
Machine learning, Text mining, Sentiment analysis, Imbalanced learning, Multi-class imbalance | Pairwise comparison,Multiple correspondence analysis,Social media,Binary classification,Pattern recognition,Sentiment analysis,Computer science,Outreach,Curse of dimensionality,Preprocessor,Artificial intelligence,Machine learning | Conference |
Volume | ISSN | Citations |
10334 | 0302-9743 | 2 |
PageRank | References | Authors |
0.41 | 8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bartosz Krawczyk | 1 | 721 | 60.97 |
Bridget T. McInnes | 2 | 280 | 23.66 |
Alberto Cano | 3 | 130 | 11.20 |