Abstract | ||
---|---|---|
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores on a single die, requests new execution paradigms. Coarse-grained dataflow execution models are able to exploit such parallelism, since they combine side-effect free execution and reduced synchronization overhead. However, the terascale transistor integration of such future chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means dynamic fault-tolerance mechanisms have to be an essential part of such future system. In this paper, we present a fault tolerant architecture for a coarse-grained dataflow system, leveraging the inherent features of the dataflow execution model. In detail, we provide methods to dynamically detect and manage permanent, intermittent, and transient faults during runtime. Furthermore, we exploit the dataflow execution model for a thread-level recovery scheme. Our results showed that redundant execution of dataflow threads can efficiently make use of underutilized resources in a multi-core, while the overhead in a fully utilized system stays reasonable. Moreover, thread-level recovery suffered from moderate overhead, even in the case of high fault rates. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/s10766-014-0312-y | International Journal of Parallel Programming |
Keywords | Field | DocType |
Coarse-grained dataflow, Fault tolerance, Fault detection, Recovery, Reliability | Architectural support,Fault detection and isolation,Computer science,Parallel computing,Dataflow,Fault tolerance | Journal |
Volume | Issue | ISSN |
44 | 2 | 1573-7640 |
Citations | PageRank | References |
10 | 0.55 | 40 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sebastian Weis | 1 | 69 | 7.15 |
Arne Garbade | 2 | 64 | 5.34 |
Bernhard Fechner | 3 | 78 | 12.18 |
Avi Mendelson | 4 | 517 | 55.88 |
R. Giorgi | 5 | 123 | 16.60 |
Theo Ungerer | 6 | 1262 | 136.24 |