Title
Multi-labeled Dataset of Arabic COVID-19 Tweets for Topic-Based Sentiment Classifications
Abstract
Natural Language Processing (NLP) can analyze and classify the growing number of expressed opinions and feelings of online texts and quickly get the required feedback. The technique of automatically labeling a textual document with the most appropriate collection of labels is known as text classification, whereas supervised text classifiers require extensive human expertise and labeling efforts. This paper seeks to build a multi-labeled Arabic dataset by labeling an Arabic Covid-19 Tweet to two groups based on their lexical features: related topic and associated sentiment. An extensive dataset was created from Twitter posts to achieve this purpose. There are over 32k multi-labeled tweets in the dataset. The dataset will be made freely available to the Arabic computational linguistics research community. This work used both traditional machine learning approaches and a deep-learning approach to investigate this dataset’s performance. This paper demonstrates that traditional ML approaches provide higher accuracy with almost stable performance when experienced on the Twitter dataset for sentiment analysis and topic classification.
Year
DOI
Venue
2022
10.1109/EAIS51927.2022.9787700
2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)
Keywords
DocType
ISSN
Sentiment Analysis (SA),Modern Standard Arabic (MSA),Deep learning,Topic Modeling,Classification,Categorization,Natural Language Processing (NLP),Arabic language,Latene Dialect Annotation (LDA)
Conference
2330-4863
ISBN
Citations 
PageRank 
978-1-6654-3707-3
0
0.34
References 
Authors
9
3