Datashim and Its Applications in Bioinformatics - Citegraph

Paper Info

Title
Datashim and Its Applications in Bioinformatics

Abstract
Bioinformatics pipelines depend on shared POSIX filesystems for its input, output and intermediate data storage. Containerization makes it more difficult for the workloads to access the shared file systems. In our previous study, we were able to run both ML and non-ML pipelines on Kubeflow successfully. However, the storage solutions were complex and less optimal. In this article, we are introducing a new concept of Dataset and its corresponding resource as a native Kubernetes object. We have implemented the concept with a new framework Datashim which takes care of all the low-level details about data access in Kubernetes pods. Its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Together, they manage the entire lifecycle of the custom resource Dataset. We use Datashim to serve data from object stores to both ML and non-ML pipelines on Kubeflow. We feed training data into ML models directly with Datashim instead of downloading it to the local disks, which makes the input scalable. We have enhanced the durability of training metadata by storing it into a dataset, which also simplifies the setup of the TensorBoard, independent of the notebook server. For the non-ML pipeline, we have simplified the 1000 Genome Project pipeline with datasets injected into the pipeline dynamically. We have now established a new resource type Dataset to represent the concept of data source on Kubernetes with our novel framework Datashim to manage its lifecycle.

Year	DOI	Venue
2021	10.1007/978-3-030-90539-2_28	HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS
Keywords	DocType	Volume
Datashim, Kubeflow, Kubernetes, Bioinformatics	Conference	12761
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yiannis Gkoufas	1	17	5.96
David Yu Yuan	2	0	0.34
Christian Pinto	3	0	2.03
Panos K. Koutsovasilis	4	0	0.34
Srikumar Venugopal	5	0	0.68

1