Title
Semi-Supervised Self Training to Assess the Credibility of Tweets
Abstract
In our daily lives, social media generates a large amount of data, and the value of this information is becoming increasingly obvious today. However, identifying the credibility of information is a complex issue, and manual assessment to evaluate the credibility of information is very time consuming. Therefore, semi-supervised learning using a small amount of labeled data, combined with a large amount of unlabeled data can be a practical approach for evaluating the credibility of information in the data generated in social media. In this experiment, a typical self-training algorithm in semi-supervised learning is used for evaluating the credibility of Tweeter data. The algorithm first trains the classifier with the labeled data using a supervised learning algorithm, and then predicts the category of the unlabeled data with the classifier obtained from the training. We implemented an improved self-training algorithm by using the repeated labeling strategy. We conducted experiments on disaster related Twitter data and predicted the credibility of disaster related Twitter data. Logistic regression and random forest classifiers were used in our experiments. Our experiments show the improved self-training algorithm produced better accuracy in the classification.
Year
DOI
Venue
2021
10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00206
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021)
Keywords
DocType
ISSN
Self-training, Semi-supervised Learning, Logistic Regression, Random Forest
Conference
2158-9178
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Leyu Gao100.34
Sandeep Shah200.34
Nasser Assery300.34
Xiaohong Yuan416926.72
Xiuli Qu500.34
Kaushik Roy600.34