Title
TREC Dynamic Domain: Polar Science.
Abstract
Abstract : This paper outlines the creation of the Polar dataset within the TREC-Dynamic Domain track. The techniques used to create the Polar dataset fall into two basic categories: information extraction using Apache Tika and information retrieval using Apache Nutch. First, we expanded the parsing capabilities of Apache Tika, an open source framework for text and metadata extraction, to provide more searchable content within Polar data repositories. Second, we used Apache Nutch, a distributed search engine that runs on top of Apache Hadoop, to crawl three prominent Polar data repositories: the National Science Foundation Advanced Cooperative Arctic Data and Information System (ACADIS), the National Snow and Ice Data Center (NSIDC) Arctic Data Explorer (ADE), and the National Aeronautics and Space Administration Antarctic Master Directory (AMD). Because finding data is often a primary challenge in scientific discovery, the inclusion of the Polar dataset in TREC-DD helps advance science through data discovery and provides TREC-DD a new challenge in in the realm of search relevancy.
Year
Venue
Field
2015
TREC
Information system,Data discovery,Data mining,Metadata,World Wide Web,Information retrieval,Directory,Computer science,Information extraction,Parsing,National Snow and Ice Data Center,Arctic
DocType
Citations 
PageRank 
Conference
1
0.36
References 
Authors
1
5
Name
Order
Citations
PageRank
Annie Bryant Burgess110.36
Chris A. Mattmann220025.39
Giuseppe Totaro310.70
Lewis John McGibbney411.04
Paul M. Ramirez5111.65