Title
Towards building a scholarly big data platform: Challenges, lessons and opportunities
Abstract
We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.
Year
Venue
Keywords
2014
JCDL
distributed repositories,log data analytics,information extraction,application program interfaces,data privacy,user information,parallel programming,scholarly information harvesting,mapreduce-based crawl-extraction-ingestion workflow,cei workflow,scholarly big data,scholarly applications,recommender systems,secured private cloud,dynamic scheduler,scholarly focused crawler,data crawling,collaborator discovery,data storage,big data platform,restful api,big data,oai,citation recommendation,cloud computing,distributed databases,citation analysis
Field
DocType
ISSN
World Wide Web,Architecture,Information retrieval,Computer science,Server,User information,Information extraction,Focused crawler,Big data,Workflow,Cloud computing
Conference
2575-7865
ISBN
Citations 
PageRank 
978-1-4799-5569-5
14
0.72
References 
Authors
32
11
Name
Order
Citations
PageRank
Zhaohui Wu115813.51
Jian Wu2452.92
Madian Khabsa323718.81
Kyle Williams420821.61
Hung-Hsuan Chen524616.71
Wenyi Huang61628.16
Suppawong Tuarob722718.54
Sagnik Ray Choudhury8745.86
Ororbia II Alexander G.912117.83
Prasenjit Mitra102439167.89
C. Lee Giles11111541549.48