Title
Clowder: Open Source Data Management for Long Tail Data
Abstract
ABSTRACTClowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources.
Year
DOI
Venue
2018
10.1145/3219104.3219159
PEARC '18
DocType
Citations 
PageRank 
Conference
1
0.36
References 
Authors
0
9
Name
Order
Citations
PageRank
Luigi Marini18514.61
Indira Gutierrez-Polo210.70
Rob Kooper31234235.10
Sandeep Puthanveetil Satheesan423.09
Maxwell Burnette511.04
Jong Lee692.72
Todd Nicholson741.31
Yan Zhao81613.39
Kenton McHenry95411.15