Title | ||
---|---|---|
Efficient reduction for wait-free termination detection in a crash-prone distributed system |
Abstract | ||
---|---|---|
We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm $\mathcal{A}$ that has been designed for a failure-free environment into a termination detection algorithm $\mathcal{B}$ that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n,M) and δ(n,M) denote the message complexity and detection latency, respectively, of $\mathcal{A}$ when the system has n processes and the underlying computation exchanges M application messages. The message complexity of $\mathcal{B}$ is at most O(n + μ(n,0)) messages per failure more than the message complexity of $\mathcal{A}$. Also, its detection latency is at most O(δ(n,0)) per failure more than that of $\mathcal{A}$. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only O(log n) bits per failure. The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to n–1 processes, that is, it is wait-free. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is fault-reactive. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11561927_9 | DISC |
Keywords | Field | DocType |
efficient reduction,detection latency,application message increase,wait-free termination detection,termination detection algorithm,process crash,termination detection problem,fault-sensitive termination detection algorithm,perfect failure detector,m application message,fault-tolerant termination detection algorithm,message complexity,distributed system,failure detector,fault tolerant,distributed computing | Failure detector,Crash,Binary logarithm,Algorithm transformation,Latency (engineering),Computer science,Network topology,Real-time computing,Partition (number theory),Distributed computing,Computation | Conference |
Volume | ISSN | ISBN |
3724 | 0302-9743 | 3-540-29163-6 |
Citations | PageRank | References |
9 | 0.57 | 23 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Neeraj Mittal | 1 | 444 | 37.04 |
Felix C. Freiling | 2 | 1415 | 137.30 |
S. Venkatesan | 3 | 196 | 15.51 |
Lucia Draque Penso | 4 | 196 | 20.46 |