Efficient reduction for wait-free termination detection in a crash-prone distributed system - Citegraph

Paper Info

Title
Efficient reduction for wait-free termination detection in a crash-prone distributed system

Abstract
We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm $\mathcal{A}$ that has been designed for a failure-free environment into a termination detection algorithm $\mathcal{B}$ that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n,M) and δ(n,M) denote the message complexity and detection latency, respectively, of $\mathcal{A}$ when the system has n processes and the underlying computation exchanges M application messages. The message complexity of $\mathcal{B}$ is at most O(n + μ(n,0)) messages per failure more than the message complexity of $\mathcal{A}$. Also, its detection latency is at most O(δ(n,0)) per failure more than that of $\mathcal{A}$. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only O(log n) bits per failure. The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to n–1 processes, that is, it is wait-free. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is fault-reactive. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.

Year	DOI	Venue
2005	10.1007/11561927_9	DISC
Keywords	Field	DocType
efficient reduction,detection latency,application message increase,wait-free termination detection,termination detection algorithm,process crash,termination detection problem,fault-sensitive termination detection algorithm,perfect failure detector,m application message,fault-tolerant termination detection algorithm,message complexity,distributed system,failure detector,fault tolerant,distributed computing	Failure detector,Crash,Binary logarithm,Algorithm transformation,Latency (engineering),Computer science,Network topology,Real-time computing,Partition (number theory),Distributed computing,Computation	Conference
Volume	ISSN	ISBN
3724	0302-9743	3-540-29163-6
Citations	PageRank	References
9	0.57	23
Authors
4

Authors (4 rows)

Cited by (9 rows)

References (23 rows)

Name	Order	Citations	PageRank
Neeraj Mittal	1	444	37.04
Felix C. Freiling	2	1415	137.30
S. Venkatesan	3	196	15.51
Lucia Draque Penso	4	196	20.46

1