Crash Management for Distributed Parallel Systems - Citegraph

Paper Info

Title
Crash Management for Distributed Parallel Systems

Abstract
With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organic computing's self-x features. Self-healing in this context means that computer clusters should detect and handle failures automatically. This paper presents a self-healing mechanism based on checkpointing, so that a cluster remains operative even if some sites or the connections between them fail. The proposed method has been implemented and tested on the Self Distributing Virtual Machine (SDVM).

Year	Venue	Keywords
2004	GI-Jahrestagung	parallel systems
Field	DocType	Citations
Crash,Computer science,Parallel computing,Bulk synchronous parallel,Distributed computing	Conference	1
PageRank	References	Authors
0.39	4	2

Authors (2 rows)

Cited by (1 rows)

References (4 rows)

Name	Order	Citations	PageRank
Jan Haase	1	16	6.08
Frank Eschmann	2	20	3.56

1