Title
Medical case-driven classification of microblogs: characteristics and annotation
Abstract
In this paper, we study the use of microblogs as source of information for medical intelligence gathering. The huge amount of irrelevant data available in microblogs requires sophisticated filtering methods in order to identify only relevant postings. Microblogs are characteristically sparse and noisy. This requires additional considerations for selection of features for automatic classification for relevance with respect to medical intelligence gathering. In this paper, we will analyze which features are well suited. The objective of this work is three-fold: 1) Specifying annotation guidelines for creating a dataset for microblog classification, 2) Studying the characteristics of tweets for deciding on a well suited feature set, and 3) making use of that feature set in an automatic classification system for relevance filtering of microblogs. The quality of the classifier is assessed in experiments with various feature sets. The evaluation shows that despite the challenging characteristics of mircoblogs, good accuracy values of up to 89% can be achieved by the classifier. One main outcome of this work is a data set of annotated twitter data which can be used as a "gold standard" benchmark for further research in this domain.
Year
DOI
Venue
2012
10.1145/2110363.2110421
IHI
Keywords
Field
DocType
various feature set,additional consideration,automatic classification system,medical intelligence gathering,annotation guideline,medical case-driven classification,feature set,microblog classification,annotated twitter data,irrelevant data,automatic classification,social media,data analysis,gold standard
Social media,Annotation,Information retrieval,Computer science,Microblogging,Filter (signal processing),Feature set,Artificial intelligence,Classifier (linguistics),Machine learning
Conference
Citations 
PageRank 
References 
6
0.79
5
Authors
4
Name
Order
Citations
PageRank
Mustafa Sofean1152.07
Kerstin Denecke214023.57
Avaré Stewart311110.56
Matthew Smith457130.49