TY - GEN

T1 - Wait-free clock synchronization

AU - Dolev, Shlomi

AU - Welch, Jennifer L.

PY - 1993/1/1

Y1 - 1993/1/1

N2 - Multi-processor computer systems with many processors are becoming increasingly important as vehicles for solving computationally expensive problems. Synchronization among the processors is achieved with a variety of clock configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to the requirements and failure patterns of multiprocessors. Algorithms in this class can tolerate any number of processors that can fail by ceasing operation for an arbitrary time interval and then resuming operation (with or) without recognizing that a fault has occurred. These algorithms guarantee that, for some fixed k, once a processor P has been working correctly for at least k time, then as long as it continues to work correctly, (1) P does not adjust its clock, and (2) P's clock agrees with the clock of every other processor that has also been working correctly for at least k time. Because a working processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these algorithms are called wait-free. Four wait-free clock synchronization algorithms are presented for various system settings. Two of them are both wait-free and self-stabilizing. An algorithm is self-stabilizing if it is resilient to any number and any type of faults in the history in the following sense: starting with an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly performs its task. The existence of an algorithm that can tolerate any number of faulty processors and work correctly when started in an arbitrary system state is somehow surprising.

AB - Multi-processor computer systems with many processors are becoming increasingly important as vehicles for solving computationally expensive problems. Synchronization among the processors is achieved with a variety of clock configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to the requirements and failure patterns of multiprocessors. Algorithms in this class can tolerate any number of processors that can fail by ceasing operation for an arbitrary time interval and then resuming operation (with or) without recognizing that a fault has occurred. These algorithms guarantee that, for some fixed k, once a processor P has been working correctly for at least k time, then as long as it continues to work correctly, (1) P does not adjust its clock, and (2) P's clock agrees with the clock of every other processor that has also been working correctly for at least k time. Because a working processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these algorithms are called wait-free. Four wait-free clock synchronization algorithms are presented for various system settings. Two of them are both wait-free and self-stabilizing. An algorithm is self-stabilizing if it is resilient to any number and any type of faults in the history in the following sense: starting with an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly performs its task. The existence of an algorithm that can tolerate any number of faulty processors and work correctly when started in an arbitrary system state is somehow surprising.

UR - http://www.scopus.com/inward/record.url?scp=0027845904&partnerID=8YFLogxK

U2 - 10.1145/164051.164066

DO - 10.1145/164051.164066

M3 - Conference contribution

AN - SCOPUS:0027845904

SN - 0897916131

SN - 9780897916134

T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

SP - 97

EP - 108

BT - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

PB - Publ by ACM

T2 - Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing

Y2 - 15 August 1993 through 18 August 1993

ER -