Title
Anserini: Enabling the Use of Lucene for Information Retrieval Research
Abstract
Software toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed toward better ranking and less attention is usually given to scalability and other operational considerations. On the other hand, Lucene has become the de facto platform in industry for building search applications (outside a small number of companies that deploy custom infrastructure). Compared to academic IR toolkits, Lucene can handle heterogeneous web collections at scale, but lacks systematic support for evaluation over standard test collections. This paper introduces Anserini, a new information retrieval toolkit that aims to provide the best of both worlds, to better align information retrieval practice and research. Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks. Our initial efforts have focused on three functionalities: scalable, multi-threaded inverted indexing to handle modern web-scale collections, streamlined IR evaluation for ad hoc retrieval on standard test collections, and an extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box. Experiments verify that our system is both efficient and effective, providing a solid foundation to support future research.
Year
DOI
Venue
2017
10.1145/3077136.3080721
SIGIR
Field
DocType
ISBN
Data mining,IR evaluation,World Wide Web,Ranking,Information retrieval,Computer science,Search engine indexing,Software,Extensible architecture,Scalability
Conference
978-1-4503-5022-8
Citations 
PageRank 
References 
21
1.30
5
Authors
3
Name
Order
Citations
PageRank
Peilin Yang110012.00
Hui Fang291863.03
Jimmy Lin34800376.93