Abstract | ||
---|---|---|
Despite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. In this paper, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ~267,000 unique research papers through our fully-automated framework using ~76,000 queries, resulting in almost 200,000 more papers than the number of queries. Moreover, through a combination of title and author name search, we were able to recover 78% of the original searched titles. |
Year | DOI | Venue |
---|---|---|
2020 | 10.18653/v1/2020.sdp-1.20 | SDP@EMNLP |
DocType | Volume | Citations |
Conference | 2020.sdp-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Krutarth Patel | 1 | 1 | 2.72 |
Cornelia Caragea | 2 | 520 | 53.61 |
Sujatha Das Gollapalli | 3 | 74 | 6.24 |