Title
Efficient reduction for wait-free termination detection in a crash-prone distributed system
Abstract
We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm $\mathcal{A}$ that has been designed for a failure-free environment into a termination detection algorithm $\mathcal{B}$ that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n,M) and δ(n,M) denote the message complexity and detection latency, respectively, of $\mathcal{A}$ when the system has n processes and the underlying computation exchanges M application messages. The message complexity of $\mathcal{B}$ is at most O(n + μ(n,0)) messages per failure more than the message complexity of $\mathcal{A}$. Also, its detection latency is at most O(δ(n,0)) per failure more than that of $\mathcal{A}$. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only O(log n) bits per failure. The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to n–1 processes, that is, it is wait-free. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is fault-reactive. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.
Year
DOI
Venue
2005
10.1007/11561927_9
DISC
Keywords
Field
DocType
efficient reduction,detection latency,application message increase,wait-free termination detection,termination detection algorithm,process crash,termination detection problem,fault-sensitive termination detection algorithm,perfect failure detector,m application message,fault-tolerant termination detection algorithm,message complexity,distributed system,failure detector,fault tolerant,distributed computing
Failure detector,Crash,Binary logarithm,Algorithm transformation,Latency (engineering),Computer science,Network topology,Real-time computing,Partition (number theory),Distributed computing,Computation
Conference
Volume
ISSN
ISBN
3724
0302-9743
3-540-29163-6
Citations 
PageRank 
References 
9
0.57
23
Authors
4
Name
Order
Citations
PageRank
Neeraj Mittal144437.04
Felix C. Freiling21415137.30
S. Venkatesan319615.51
Lucia Draque Penso419620.46