Title
Dataset Reuse: Toward Translating Principles To Practice
Abstract
The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse.
Year
DOI
Venue
2020
10.1016/j.patter.2020.100136
PATTERNS
Keywords
DocType
Volume
data portals,dataset reuse,human-data interaction,neural networks,reuse prediction
Journal
1
Issue
ISSN
Citations 
8
2666-3899
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Laura Koesten163.50
Pavlos Vougiouklis200.34
Elena Simperl31069122.60
Paul Groth41709139.30