TY - GEN
T1 - Wait-free clock synchronization
AU - Dolev, Shlomi
AU - Welch, Jennifer L.
PY - 1993/1/1
Y1 - 1993/1/1
N2 - Multi-processor computer systems with many processors are becoming increasingly important as vehicles for solving computationally expensive problems. Synchronization among the processors is achieved with a variety of clock configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to the requirements and failure patterns of multiprocessors. Algorithms in this class can tolerate any number of processors that can fail by ceasing operation for an arbitrary time interval and then resuming operation (with or) without recognizing that a fault has occurred. These algorithms guarantee that, for some fixed k, once a processor P has been working correctly for at least k time, then as long as it continues to work correctly, (1) P does not adjust its clock, and (2) P's clock agrees with the clock of every other processor that has also been working correctly for at least k time. Because a working processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these algorithms are called wait-free. Four wait-free clock synchronization algorithms are presented for various system settings. Two of them are both wait-free and self-stabilizing. An algorithm is self-stabilizing if it is resilient to any number and any type of faults in the history in the following sense: starting with an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly performs its task. The existence of an algorithm that can tolerate any number of faulty processors and work correctly when started in an arbitrary system state is somehow surprising.
AB - Multi-processor computer systems with many processors are becoming increasingly important as vehicles for solving computationally expensive problems. Synchronization among the processors is achieved with a variety of clock configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to the requirements and failure patterns of multiprocessors. Algorithms in this class can tolerate any number of processors that can fail by ceasing operation for an arbitrary time interval and then resuming operation (with or) without recognizing that a fault has occurred. These algorithms guarantee that, for some fixed k, once a processor P has been working correctly for at least k time, then as long as it continues to work correctly, (1) P does not adjust its clock, and (2) P's clock agrees with the clock of every other processor that has also been working correctly for at least k time. Because a working processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these algorithms are called wait-free. Four wait-free clock synchronization algorithms are presented for various system settings. Two of them are both wait-free and self-stabilizing. An algorithm is self-stabilizing if it is resilient to any number and any type of faults in the history in the following sense: starting with an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly performs its task. The existence of an algorithm that can tolerate any number of faulty processors and work correctly when started in an arbitrary system state is somehow surprising.
UR - http://www.scopus.com/inward/record.url?scp=0027845904&partnerID=8YFLogxK
U2 - 10.1145/164051.164066
DO - 10.1145/164051.164066
M3 - Conference contribution
AN - SCOPUS:0027845904
SN - 0897916131
SN - 9780897916134
T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing
SP - 97
EP - 108
BT - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing
PB - Publ by ACM
T2 - Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing
Y2 - 15 August 1993 through 18 August 1993
ER -