Title
A Large Scholarly Corpus: A Bird's-Eye View
Abstract
In this paper we present a new, very large, rich, Comprehensive Scholarly Corpus (CompScholarCorp) as a platform and data source for future research. Our corpus contains records of 1,044,454 papers, 472,365 unique authors, and substantial publication meta-data for each record. We have integrated the data we collected from 276 publishers using a uniform and consistent XML data format within the corpus. The corpus is designed to be compatible with DBLP enabling existing research to utilise our new corpus directly. As an initial analysis of the corpus, we present a number of visualisations of the corpus to better understand the data, provide some analytics of the data, and present a rule-of-thumb we have observed for citations.
Year
DOI
Venue
2017
10.1109/eScience.2017.75
2017 IEEE 13th International Conference on e-Science (e-Science)
Keywords
Field
DocType
Corpus,Scholar,Collaborative network,Social Network,Citation,Network Visualisation
Data source,Data mining,Data visualization,Information retrieval,XML,Computer science,Visualization,Xml data,Corpus linguistics,Analytics,Cloud computing
Conference
ISSN
ISBN
Citations 
2325-372X
978-1-5386-2687-0
0
PageRank 
References 
Authors
0.34
2
2
Name
Order
Citations
PageRank
Yashar Najaflou100.34
Kris Bubendorfer234129.28