Title
Searching microblogs: coping with sparsity and document quality
Abstract
Two of the main challenges in retrieval on microblogs are the inherent sparsity of the documents and difficulties in assessing their quality. The feature sparsity is immanent to the restriction of the medium to short texts. Quality assessment is necessary as the microblog documents range from spam over trivia and personal chatter to news broadcasts, information dissemination and reports of current hot topics. In this paper we analyze how these challenges can influence standard retrieval models and propose methods to overcome the problems they pose. We consider the sparsity's effect on document length normalization and introduce "interestingness" as static quality measure. Our results show that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries.
Year
DOI
Venue
2011
10.1145/2063576.2063607
CIKM
Keywords
Field
DocType
information dissemination,quality assessment,feature sparsity,document length normalization,better retrieval result,length normalization yield,inherent sparsity,standard retrieval model,current hot topic,document quality,static quality measure,microblog
Data mining,Social media,Normalization (statistics),Information retrieval,Computer science,Coping (psychology),Microblogging,Document quality,Information Dissemination
Conference
Citations 
PageRank 
References 
55
2.05
12
Authors
4
Name
Order
Citations
PageRank
Nasir Naveed11426.66
Thomas Gottron243235.32
Jérôme Kunegis387451.20
Arifah Che Alhadi41447.72