Title
Bandwidth Modeling in Large Distributed Systems for Big Data Applications
Abstract
The emergence of Big Data applications provides new challenges in data management such as processing and movement of masses of data. Volunteer computing has proven itself as a distributed paradigm that can fully support Big Data generation. This paradigm uses a large number of heterogeneous and unreliable Internet-connected hosts to provide Peta-scale computing power for scientific projects. With the increase in data size and number of devices that can potentially join a volunteer computing project, the host bandwidth can become a main hindrance to the analysis of the data generated by these projects, especially if the analysis is a concurrent approach based on either in-situ or in-transit processing. In this paper, we propose a bandwidth model for volunteer computing projects based on the real trace data taken from the Docking@Home project with more than 280,000 hosts over a 5-year period. We validate the proposed statistical model using model-based and simulation-based techniques. Our modeling provides us with valuable insights on the concurrent integration of data generation with in-situ and in-transit analysis in the volunteer computing paradigm.
Year
DOI
Venue
2014
10.1109/PDCAT.2014.12
2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies
Keywords
Field
DocType
Volunteer Computing,Big Data,Internet Bandwidth,Statistical Modeling
Data science,Programming with Big Data in R,Computer science,Bandwidth (signal processing),Statistical model,Data management,Big data,Volunteer computing,Test data generation,Distributed computing
Conference
ISSN
Citations 
PageRank 
2379-5352
1
0.39
References 
Authors
16
3
Name
Order
Citations
PageRank
bahman javadi166640.59
boyu zhang27117.54
michela taufer335253.04