Title
Predicting Flu Incidence From Portuguese Tweets
Abstract
Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we evaluate the use of Twitter messages and search engine query logs to estimate the incidence rate of influenza like illness in Portugal.Based on a classified dataset of 2704 tweets from Portugal, we obtained a precision of 0.78 and an F-measure of 0.83 for a Naive Bayes classifier with 650 textual features. We obtained a Pearson's correlation ratio of 0.89 (p < 0.001) between health-monitoring data from the Influenzanet project and the prediction by a multiple linear regression model, using as predictors the relative frequencies estimated from the classifier output and from query logs.Although the Portuguese community in Twitter is small, our results are comparable to previous approaches in other languages, and indicate that this approach could be used in the future to complement other measures of disease incidence rates.
Year
Venue
Keywords
2013
PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING
flu incidence, disease outbreak monitoring, social media, user-generated content, text classification
Field
DocType
Citations 
Data mining,Search engine,Social media,Incidence (epidemiology),Naive Bayes classifier,Portuguese,Correlation ratio,Statistics,Multiple linear regression model,Geography
Conference
5
PageRank 
References 
Authors
0.52
6
2
Name
Order
Citations
PageRank
Jose Carlos Santos150.52
Sérgio Matos241529.51