Title
Scalability influence on retrieval models: an experimental methodology
Abstract
Few works in Information Retrieval (IR) tackled the questions of Information Retrieval Systems (IRS) effectiveness and efficiency in the context of scalability in corpus size. We propose a general experimental methodology to study the scalability influence on IR models. This methodology is based on the construction of a collection on which a given characteristic C is the same whatever be the portion of collection selected. This new collection called uniform can be split into sub-collection of growing size on which some given properties will be studied. We apply our methodology to WT10G (TREC9 collection) and consider the characteristic C to be the distribution of relevant documents on a collection. We build a uniform WT10G, sample it into sub-collections of increasing size and use these sub-collections to study the impact of corpus volume increase on standards IRS evaluation measures (recall/precision, high precision).
Year
DOI
Venue
2005
10.1007/978-3-540-31865-1_28
ECIR
Keywords
Field
DocType
corpus size,ir model,scalability influence,characteristic c,general experimental methodology,trec9 collection,information retrieval systems,high precision,information retrieval,corpus volume increase,retrieval model,new collection,information retrieval system
Information system,Data mining,Information retrieval,Computer science,Recall,Scalability
Conference
Volume
ISSN
ISBN
3408
0302-9743
3-540-25295-9
Citations 
PageRank 
References 
1
0.40
16
Authors
2
Name
Order
Citations
PageRank
Amélie Imafouo133.13
Michel Beigbeder27223.49