TY - JOUR
T1 - Self-stabilizing microprocessor
T2 - Analyzing and overcoming soft errors
AU - Dolev, Shlomi
AU - Haviv, Yinnon A.
N1 - Funding Information:
The work of Shlomi Dolev was partially supported by IBM, the Israeli Ministry of Science, Intel, Deutsche Telekom, and the Rita Altura Trust Chair in Computer Sciences. The work of Yinnon A. Haviv was partially supported by Intel, vaatat, and the Lynn and William Frankel Center for Computer Sciences.
PY - 2006/4/1
Y1 - 2006/4/1
N2 - Soft errors are changes in memory value caused by external radiation or electrical noise. Decreases in computing feature sizes and power usages and shorting the microcycle period enhance the influence of soft errors. Self-stabilizing systems are designed to be started in an arbitrary, possibly a corrupted, state due to, say, soft errors, and to converge to a desired behavior. Self-stabilization is defined by the state space of the components and is essentially a well-founded, clearly defined form of the terms self-healing, automatic-recovery, automatic-repair, and autonomic-computing. To implement a self-stabilizing system, one needs to ensure that the microprocessor that executes the program is self-stabilizing. A self-stabilizing microprocessor copes with any combination of soft errors, converging to perform fetch-decode-execute in fault-free periods. Still, it is important that the microprocessor will avoid convergence periods if possible by masking the effect of soft errors immediately. In this work, we present design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors. Previous schemes for analyzing the effect of soft errors were based on simulations. In contrast, our scheme computes a lower bound on microprocessor reliability and enables the microprocessor designer to evaluate the reliability of the design and to identify reliability bottlenecks. When analyzing the resiliency of digital circuits to soft errors, we examine the logical masking, i.e., errors in internal nodes of the circuits that are masked later by the computation. We show that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP-hard problem.
AB - Soft errors are changes in memory value caused by external radiation or electrical noise. Decreases in computing feature sizes and power usages and shorting the microcycle period enhance the influence of soft errors. Self-stabilizing systems are designed to be started in an arbitrary, possibly a corrupted, state due to, say, soft errors, and to converge to a desired behavior. Self-stabilization is defined by the state space of the components and is essentially a well-founded, clearly defined form of the terms self-healing, automatic-recovery, automatic-repair, and autonomic-computing. To implement a self-stabilizing system, one needs to ensure that the microprocessor that executes the program is self-stabilizing. A self-stabilizing microprocessor copes with any combination of soft errors, converging to perform fetch-decode-execute in fault-free periods. Still, it is important that the microprocessor will avoid convergence periods if possible by masking the effect of soft errors immediately. In this work, we present design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors. Previous schemes for analyzing the effect of soft errors were based on simulations. In contrast, our scheme computes a lower bound on microprocessor reliability and enables the microprocessor designer to evaluate the reliability of the design and to identify reliability bottlenecks. When analyzing the resiliency of digital circuits to soft errors, we examine the logical masking, i.e., errors in internal nodes of the circuits that are masked later by the computation. We show that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP-hard problem.
KW - Microprocessor
KW - Self-stabilization
KW - Single event upset
KW - Soft errors
UR - http://www.scopus.com/inward/record.url?scp=33645217101&partnerID=8YFLogxK
U2 - 10.1109/TC.2006.61
DO - 10.1109/TC.2006.61
M3 - Article
AN - SCOPUS:33645217101
SN - 0018-9340
VL - 55
SP - 385
EP - 399
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 4
ER -