Abstract | ||
---|---|---|
Few works in Information Retrieval (IR) tackled the questions of Information Retrieval Systems (IRS) effectiveness and efficiency in the context of scalability in corpus size. We propose a general experimental methodology to study the scalability influence on IR models. This methodology is based on the construction of a collection on which a given characteristic C is the same whatever be the portion of collection selected. This new collection called uniform can be split into sub-collection of growing size on which some given properties will be studied. We apply our methodology to WT10G (TREC9 collection) and consider the characteristic C to be the distribution of relevant documents on a collection. We build a uniform WT10G, sample it into sub-collections of increasing size and use these sub-collections to study the impact of corpus volume increase on standards IRS evaluation measures (recall/precision, high precision). |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/978-3-540-31865-1_28 | ECIR |
Keywords | Field | DocType |
corpus size,ir model,scalability influence,characteristic c,general experimental methodology,trec9 collection,information retrieval systems,high precision,information retrieval,corpus volume increase,retrieval model,new collection,information retrieval system | Information system,Data mining,Information retrieval,Computer science,Recall,Scalability | Conference |
Volume | ISSN | ISBN |
3408 | 0302-9743 | 3-540-25295-9 |
Citations | PageRank | References |
1 | 0.40 | 16 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Amélie Imafouo | 1 | 3 | 3.13 |
Michel Beigbeder | 2 | 72 | 23.49 |