Title
LODStats --- an extensible framework for high-performance dataset analytics
Abstract
One of the major obstacles for a wider usage of web data is the difficulty to obtain a clear picture of the available datasets. In order to reuse, link, revise or query a dataset published on the Web it is important to know the structure, coverage and coherence of the data. In order to obtain such information we developed LODStats --- a statement-stream-based approach for gathering comprehensive statistics about datasets adhering to the Resource Description Framework (RDF). LODStats is based on the declarative description of statistical dataset characteristics. Its main advantages over other approaches are a smaller memory footprint and significantly better performance and scalability. We integrated LODStats with the CKAN dataset metadata registry and obtained a comprehensive picture of the current state of a significant part of the Data Web.
Year
DOI
Venue
2012
10.1007/978-3-642-33876-2_31
EKAW
Keywords
Field
DocType
comprehensive picture,data web,extensible framework,statistical dataset characteristic,resource description framework,web data,high-performance dataset analytics,comprehensive statistic,available datasets,better performance,clear picture,integrated lodstats
Data mining,Computer science,Reuse,Data Web,SPARQL,Metadata registry,Analytics,Memory footprint,RDF,Scalability
Conference
Citations 
PageRank 
References 
84
5.07
9
Authors
4
Name
Order
Citations
PageRank
Sören Auer15711418.56
Jan Demter2845.07
Michael Martin318418.09
Jens Lehmann45375355.08