A survey of multilingual human-tagged short message datasets for sentiment analysis tasks. - Citegraph

Paper Info

Title
A survey of multilingual human-tagged short message datasets for sentiment analysis tasks.

Abstract
Today, the electronic word-of-mouth (eWOM) statements expressed on blogs, social media or shopping platforms are much frequent and enable customers to share his/her point of view about acquired products or services. These eWOM statements can be used for the industry to improve its products and services and for customers for making better purchase decisions. Sentiment analysis (SA) techniques can be used to extract and analyze these eWOM statements. Research in recent years on SA has advanced considerably, and its applications in business management have grown exponentially. Automatic techniques (such as machine learning, deep learning and statistic approaches) have been used for this purpose. However, training a machine for processing or analyzing sentiments is a hard task, mainly due to the complexity of the natural language. This task is more complicated in multilingual environments. There is still a great paucity regarding training datasets, one of the key resources in achieving more favorable results. Training datasets, in fact, are a reservoir of information serving to teach and refine the skills of automatic techniques. Hence, the higher the quality of the training datasets, the better predictive power of sentiment analysis tasks. English datasets are relatively easy to find in the literature; however, datasets in other languages are very scarce. So, this paper therefore describes and compiles information concerning 25 datasets gleaned from short messages (statements expressed in social media and shopping platforms) in seven different languages, for the most part from Twitter. For quality issues, all the resources were human-tagged, and they are currently available to the scientific community. A new sentiment dataset in English extracted from Twitter has also been drawn up and each message evaluated subjectively. The current survey therefore aims to provide essential quality information for future research related to automatic sentiment analysis in monolingual or multilingual scenarios.

Year	DOI	Venue
2018	10.1007/s00500-017-2766-5	Soft Comput.
Keywords	Field	DocType
Sentiment analysis, Dataset, Corpus, Short messages, Multilingual, Twitter, Human-tagged	Data science,Social media,Statistic,Predictive power,Sentiment analysis,Computer science,Natural language,Business management,Artificial intelligence,Deep learning,Machine learning	Journal
Volume	Issue	ISSN
22	24	1432-7643
Citations	PageRank	References
0	0.34	80
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (80 rows)

Name	Order	Citations	PageRank
F. Steiner-Correa	1	0	0.34
María I. Viedma-del Jesús	2	0	0.34
A. G. Lopez-Herrera	3	0	0.68

1