Title
SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web
Abstract
SeerSuite is a framework for scientific and academic digital libraries and search engines built by crawling scientific and academic documents from the web with a focus on providing reliable, robust services. In addition to full text indexing, SeerSuite supports autonomous citation indexing and automatically links references in research articles to facilitate navigation, analysis and evaluation. SeerSuite enables access to extensive document, citation, and author metadata by automatically extracting, storing and indexing metadata. SeerSuite also supports MyCiteSeer, a personal portal that allows users to monitor documents, store user queries, build document portfolios, and interact with the document metadata. We describe the design of SeerSuite and the deployment and usage of CiteSeerx as an instance of SeerSuite.
Year
Venue
Keywords
2010
WebApps
academic digital library,full text indexing,personal portal,author metadata,reliable application framework,indexing metadata,autonomous citation indexing,academic document,document metadata,extensive document,document portfolio
Field
DocType
Citations 
Metadata,World Wide Web,Search engine,Crawling,Software deployment,Information retrieval,Computer science,Citation,Search engine indexing,Digital library,Scalability
Conference
16
PageRank 
References 
Authors
1.05
17
6
Name
Order
Citations
PageRank
Pradeep B. Teregowda1545.93
Isaac G. Councill246927.27
R. Juan Pablo Fernández3161.05
Madian Khabsa423718.81
Shuyi Zheng525611.22
C. Lee Giles6111541549.48