Title
Decentralized Load Balancing for Improving Reliability in Heterogeneous Distributed Systems
Abstract
A probabilistic analytical framework for decentralized load balancing (LB) strategies for heterogeneous distributed-computing systems (DCSs) is presented with the overall goal of maximizing the service reliability in the presence of random failures. The service reliability of a DCS is defined as the probability of successfully serving a specified workload before all the computing nodes fail permanently. In the framework considered the service and failure times of nodes are random, the communication times in the network are both tangible and stochastic, and LB is performed synchronously by all the nodes during the runtime of each submitted workload. By taking a novel regenerative stochastic-analysis approach, the service reliability of a two-node DCS is characterized analytically. This formulation, in turn, is used to form and solve an optimization problem, yielding LB policies with maximal reliability. A scalable extension of the two-node formulation to an arbitrary size system is also presented. The validity of the proposed theory is studied using both Monte-Carlo simulations and real experiments on a small-scale testbed.
Year
DOI
Venue
2009
10.1109/ICPPW.2009.50
ICPP Workshops
Keywords
Field
DocType
arbitrary size system,load balancing i. introduction,reliability,-renewal theory,probabilistic analytical framework,dis- tributed computing,monte-carlo simulation,maximal reliability,lb policy,specified workload,two-node dcs,random failure,improving reliability,service reliability,queuing theory,two-node formulation,decentralized load,monte carlo methods,mathematical model,queueing theory,probability,reliability theory,probability density function,optimization problem,distributed computing,distributed processing,load balancing,monte carlo simulation,communication networks,stochastic analysis,software reliability,load balance,monte carlo simulations,stochastic processes,resource allocation,renewal theory
Load management,Telecommunications network,Load balancing (computing),Computer science,Parallel computing,Queueing theory,Probabilistic logic,Optimization problem,Reliability theory,Distributed computing,Scalability
Conference
ISSN
Citations 
PageRank 
1530-2016
4
0.42
References 
Authors
9
3
Name
Order
Citations
PageRank
Jorge E. Pezoa111915.76
Sagar Dhakal2724.20
Majeed M. Hayat321326.36