Searching microblogs: coping with sparsity and document quality - Citegraph

Paper Info

Title
Searching microblogs: coping with sparsity and document quality

Abstract
Two of the main challenges in retrieval on microblogs are the inherent sparsity of the documents and difficulties in assessing their quality. The feature sparsity is immanent to the restriction of the medium to short texts. Quality assessment is necessary as the microblog documents range from spam over trivia and personal chatter to news broadcasts, information dissemination and reports of current hot topics. In this paper we analyze how these challenges can influence standard retrieval models and propose methods to overcome the problems they pose. We consider the sparsity's effect on document length normalization and introduce "interestingness" as static quality measure. Our results show that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries.

Year	DOI	Venue
2011	10.1145/2063576.2063607	CIKM
Keywords	Field	DocType
information dissemination,quality assessment,feature sparsity,document length normalization,better retrieval result,length normalization yield,inherent sparsity,standard retrieval model,current hot topic,document quality,static quality measure,microblog	Data mining,Social media,Normalization (statistics),Information retrieval,Computer science,Coping (psychology),Microblogging,Document quality,Information Dissemination	Conference
Citations	PageRank	References
55	2.05	12
Authors
4

Authors (4 rows)

Cited by (55 rows)

References (12 rows)

Name	Order	Citations	PageRank
Nasir Naveed	1	142	6.66
Thomas Gottron	2	432	35.32
Jérôme Kunegis	3	874	51.20
Arifah Che Alhadi	4	144	7.72

1