Abstract | ||
---|---|---|
In this paper, we study the use of microblogs as source of information for medical intelligence gathering. The huge amount of irrelevant data available in microblogs requires sophisticated filtering methods in order to identify only relevant postings. Microblogs are characteristically sparse and noisy. This requires additional considerations for selection of features for automatic classification for relevance with respect to medical intelligence gathering. In this paper, we will analyze which features are well suited. The objective of this work is three-fold: 1) Specifying annotation guidelines for creating a dataset for microblog classification, 2) Studying the characteristics of tweets for deciding on a well suited feature set, and 3) making use of that feature set in an automatic classification system for relevance filtering of microblogs. The quality of the classifier is assessed in experiments with various feature sets. The evaluation shows that despite the challenging characteristics of mircoblogs, good accuracy values of up to 89% can be achieved by the classifier. One main outcome of this work is a data set of annotated twitter data which can be used as a "gold standard" benchmark for further research in this domain. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2110363.2110421 | IHI |
Keywords | Field | DocType |
various feature set,additional consideration,automatic classification system,medical intelligence gathering,annotation guideline,medical case-driven classification,feature set,microblog classification,annotated twitter data,irrelevant data,automatic classification,social media,data analysis,gold standard | Social media,Annotation,Information retrieval,Computer science,Microblogging,Filter (signal processing),Feature set,Artificial intelligence,Classifier (linguistics),Machine learning | Conference |
Citations | PageRank | References |
6 | 0.79 | 5 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mustafa Sofean | 1 | 15 | 2.07 |
Kerstin Denecke | 2 | 140 | 23.57 |
Avaré Stewart | 3 | 111 | 10.56 |
Matthew Smith | 4 | 571 | 30.49 |