Title
An FSM-based monitoring technique to differentiate between follow-up and original errors in safety-critical distributed embedded systems
Abstract
Nowadays, distributed embedded systems are employed in many safety-critical applications such as X-by-Wire. These systems are composed of several nodes interconnected by a network. Studies show that a transient fault in the communication controller of a network node can lead to errors in the fault site node (called original errors) and/or in the neighbor nodes (called follow-up errors). The communication controller of a network node can be halted due to an error, which may be a follow-up error. In this situation, a follow-up error leads to halt the correct operation of a fault-free controller while the fault site node, i.e. the faulty controller, still continues its operation. In this paper, an analysis shows that the occurrence probability of follow-up errors in communication protocols is noticeable. Consequently, it is important to provide a technique to recognize the error's nature, i.e. original or follow-up in each node. This paper proposes a novel low-cost monitoring technique to differentiate follow-up errors from original errors. The proposed technique is based on monitoring the operational states of a communication controller. In this paper, this technique has been applied to the FlexRay protocol. However, it is applicable for all communication protocols having an FSM-based description such as FlexRay, TTP/C, and TT-Ethernet. To evaluate the monitoring technique, a FlexRay-based network including 4 nodes was designed and implemented. The low-cost monitoring technique was as well implemented inside each node of the network. A total of 135,600 transient bit-flip faults were injected in the communication controller of one node. The results showed that about 6.0% of injected faults lead to original errors. This figure for follow-up errors was about 6.1%. The results as well showed that the accuracy of the proposed technique to differentiate between the follow-up and original errors is about 97% at merely 1.4% hardware overhead. This level of accuracy and cost makes the proposed technique a feasible solution to enhance the reliability of communication controllers.
Year
DOI
Venue
2011
10.1016/j.mejo.2011.04.003
Microelectronics Journal
Keywords
Field
DocType
fsm-based monitoring technique,follow-up error,communication protocol,novel low-cost monitoring technique,distributed embedded systems,original error,fault site node,follow-up errors,proposed technique,communication controller,error propagation,transient faults,fsm-based monitoring,monitoring technique,low-cost monitoring technique,flexray protocol,network node,embedded system
FlexRay,Control theory,Propagation of uncertainty,Node (networking),Real-time computing,Engineering,Embedded system,Communications protocol
Journal
Volume
Issue
ISSN
42
6
Microelectronics Journal
Citations 
PageRank 
References 
0
0.34
26
Authors
2
Name
Order
Citations
PageRank
Yasser Sedaghat1366.69
Seyed Ghassem Miremadi253150.32