Abstract | ||
---|---|---|
This paper covers participation of the SINAI team in Tasks 5 and 10 of the Social Media Mining for Health (#SSM4H) workshop at COLING-2022. These tasks focus on leveraging Twitter posts written in Spanish for healthcare research. The objective of Task 5 was to classify tweets reporting COVID-19 symptoms, while Task 10 required identifying disease mentions in Twitter posts. The presented systems explore large RoBERTa language models pre-trained on Twitter data in the case of tweet classification task and general-domain data for the disease recognition task. We also present a text pre-processing methodology implemented in both systems and describe an initial weakly-supervised fine-tuning phase alongside with a submission post-processing procedure designed for Task 10. The systems obtained 0.84 F1-score on the Task 5 and 0.77 F1-score on Task 10. |
Year | Venue | DocType |
---|---|---|
2022 | International Conference on Computational Linguistics | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mariia Chizhikova | 1 | 0 | 0.34 |
Pilar López-Úbeda | 2 | 0 | 6.76 |
Manuel Carlos Díaz-Galiano | 3 | 35 | 21.69 |
Luis Alfonso Ureña López | 4 | 257 | 53.93 |
Maite Martín-Valdivia | 5 | 25 | 6.80 |