Title | ||
---|---|---|
The feasibility of investing in manual correction of metadata for a large-scale digital library |
Abstract | ||
---|---|---|
Given a large-scale digital library that automatically crawls and parses PDF files to generate metadata for documents and authors, we estimate the number of person-hours required to correct a small portion of the metadata, in the hope that a large portion of users can benefit from these corrections. We obtain users requests by analyzing Cite-SeerX's log files from September 2009 to March 2013. We found that the distribution of users requests for search is highly imbalanced: most document search queries and author search queries concentrate on a small set of terms. As a result, even for a large-scale digital library, we estimate it is affordable to invest a few person-hours to check the correctness of a few metadata, and thus provide benefits to a good portion of document search and author search requests.
|
Year | Venue | Keywords |
---|---|---|
2014 | JCDL | practicability,digital libraries,digital library,document search queries,manual metadata correction,author search queries,metadata correction,user satisfaction,pdf file parsing,meta data,human-aided metadata generation,large-scale digital library,document handling,user experience,query processing,cite-seerx log files,data mining,indexes |
Field | DocType | ISBN |
Metadata repository,Metadata,User experience design,World Wide Web,Information retrieval,Computer science,Correctness,Digital library,Portable document format | Conference | 978-1-4799-5569-5 |
Citations | PageRank | References |
0 | 0.34 | 12 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hung-Hsuan Chen | 1 | 246 | 16.71 |
Madian Khabsa | 2 | 237 | 18.81 |
C. Lee Giles | 3 | 11154 | 1549.48 |