Title
Vifi: Virtual Information Fabric Infrastructure For Data-Driven Discoveries From Distributed Earth Science Data
Abstract
Traditional data analytics involves manually identifying and downloading relevant distributed datasets of interest to a common server/cluster where the analytics processes are executed. For very large distributed datasets, this slows down the analytics process, and for extremely large datasets it is often impractical to download such massive volumes due to bandwidth limitations. In such cases, data scientists need to be provided explicit access to the remote servers hosting the datasets, and possess detailed knowledge of the server infrastructure and environments, in order to send their analytics packages to the data owner. This alternative poses considerable challenges and has not been adequately addressed to date. In this paper, we describe a novel approach to addressing this challenge called Virtual Information Fabric Infrastructure (VIFI) which seamlessly allows users to conduct analytics-in-place by distributing analytics to the distributed repositories without moving the underlying datasets to a common location. By allowing automated analytics scripts to be sent to the data and orchestration of distributed infrastructure, VIFI allows users to conduct, execute and coordinate complex analytics activities in-place with the data on multiple data repositories. VIFI uses Docker containerization technology along with open-source workflow tool NIFI to achieve automated orchestration and distributed analytics without requiring users to posses detailed knowledge of the distributed repositories and their underlying infrastructure. We demonstrate and evaluate VIFI on a Earth Science use-case for evaluation of precipitation over the Great Plains involving analytics on massive distributed data repositories.
Year
Venue
Field
2017
2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI)
Data modeling,Data analysis,Computer science,Upload,Server,Earth science,Distributed database,Analytics,Workflow,Orchestration (computing)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
Ashit Talukder111711.66
Mohammed Elshambakey201.69
Sameer Wadkar300.34
Huikyo Lee431.85
Luca Cinquini512813.91
Shannon Schlueter600.68
Isaac Cho7529.36
Wenwen Dou845929.03
Daniel J. Crichton96911.65