When consensus meets self-stabilization: Self-stabilizing failure-detector, consensus and replicated state-machine

Shlomi Dolev, Ronen I. Kat, Elad M. Schiller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper presents a self-stabilizing failure detector, asynchronous consensus and replicated state-machine algorithm suite, the components of which can be started in an arbitrary state and converge to act as a virtual state-machine. Self-stabilizing algorithms can cope with transient faults. Transient faults can alter the system state to an arbitrary state and hence, cause a temporary violation of the safety property of the consensus. New requirements for consensus that fit the on-going nature of self-stabilizing algorithms are presented. The wait-free consensus (and the replicated state-machine) algorithm presented is a classic combination of a failure detector and a (memory bounded) rotating coordinator consensus that satisfy both eventual safety and eventual liveness. Several new techniques and paradigms are introduced. The bounded memory failure detector abstracts away synchronization assumptions using bounded heartbeat counters combined with a balance-unbalance mechanism. The practically infinite paradigm is introduced in the scope of self-stabilization, where an execution of, say, 264 sequential steps is regarded as (practically) infinite. Finally, we present the first self-stabilizing wait-free reset mechanism that ensures eventual safety and can be used in other scopes.

Original languageEnglish
Title of host publicationPrinciples of Distributed Systems - 10th International Conference, OPODIS 2006, Proceedings
PublisherSpringer Verlag
Pages45-63
Number of pages19
ISBN (Print)9783540499909
DOIs
StatePublished - 1 Jan 2006
Event10th International Conference on Principles of Distributed Systems, OPODIS 2006 - Bordeaux, France
Duration: 12 Dec 200615 Dec 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4305 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Principles of Distributed Systems, OPODIS 2006
Country/TerritoryFrance
CityBordeaux
Period12/12/0615/12/06

Keywords

  • Consensus
  • Distributed Reset
  • Failure Detector
  • Self-Stabilization
  • State-Machine
  • Wait-Free

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'When consensus meets self-stabilization: Self-stabilizing failure-detector, consensus and replicated state-machine'. Together they form a unique fingerprint.

Cite this