Abstract | ||
---|---|---|
Two of the main challenges in retrieval on microblogs are the inherent sparsity of the documents and difficulties in assessing their quality. The feature sparsity is immanent to the restriction of the medium to short texts. Quality assessment is necessary as the microblog documents range from spam over trivia and personal chatter to news broadcasts, information dissemination and reports of current hot topics. In this paper we analyze how these challenges can influence standard retrieval models and propose methods to overcome the problems they pose. We consider the sparsity's effect on document length normalization and introduce "interestingness" as static quality measure. Our results show that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/2063576.2063607 | CIKM |
Keywords | Field | DocType |
information dissemination,quality assessment,feature sparsity,document length normalization,better retrieval result,length normalization yield,inherent sparsity,standard retrieval model,current hot topic,document quality,static quality measure,microblog | Data mining,Social media,Normalization (statistics),Information retrieval,Computer science,Coping (psychology),Microblogging,Document quality,Information Dissemination | Conference |
Citations | PageRank | References |
55 | 2.05 | 12 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nasir Naveed | 1 | 142 | 6.66 |
Thomas Gottron | 2 | 432 | 35.32 |
Jérôme Kunegis | 3 | 874 | 51.20 |
Arifah Che Alhadi | 4 | 144 | 7.72 |