Abstract | ||
---|---|---|
In our daily lives, social media generates a large amount of data, and the value of this information is becoming increasingly obvious today. However, identifying the credibility of information is a complex issue, and manual assessment to evaluate the credibility of information is very time consuming. Therefore, semi-supervised learning using a small amount of labeled data, combined with a large amount of unlabeled data can be a practical approach for evaluating the credibility of information in the data generated in social media. In this experiment, a typical self-training algorithm in semi-supervised learning is used for evaluating the credibility of Tweeter data. The algorithm first trains the classifier with the labeled data using a supervised learning algorithm, and then predicts the category of the unlabeled data with the classifier obtained from the training. We implemented an improved self-training algorithm by using the repeated labeling strategy. We conducted experiments on disaster related Twitter data and predicted the credibility of disaster related Twitter data. Logistic regression and random forest classifiers were used in our experiments. Our experiments show the improved self-training algorithm produced better accuracy in the classification. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00206 | 19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) |
Keywords | DocType | ISSN |
Self-training, Semi-supervised Learning, Logistic Regression, Random Forest | Conference | 2158-9178 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Leyu Gao | 1 | 0 | 0.34 |
Sandeep Shah | 2 | 0 | 0.34 |
Nasser Assery | 3 | 0 | 0.34 |
Xiaohong Yuan | 4 | 169 | 26.72 |
Xiuli Qu | 5 | 0 | 0.34 |
Kaushik Roy | 6 | 0 | 0.34 |