TY - GEN
T1 - Self-stabilization overhead
T2 - 7th International Conference on Networked Systems, NETYS 2019
AU - Georgiou, Chryssis
AU - Gustafsson, Robert
AU - Lindhé, Andreas
AU - Schiller, Elad Michael
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Shared memory emulation on distributed message-passing systems can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding to achieve data redundancy with much lower communication cost than previous algorithmic solutions. Recently, Dolev et al. introduced a version of CAS where transient faults are included in the fault model, making it self-stabilizing. But self-stabilization comes at a cost, so in this work we examine the overhead of the algorithm by implementing a system we call CASSS (CAS Self-Stabilizing). Our system builds on the self-stabilizing version of CAS, along with several other self-stabilizing building blocks. This provides us with a powerful platform to evaluate the overhead and other aspects of the real-world applicability of the algorithm. In our case-study, we evaluated the system performance by running it on the world-wide distributed platform PlanetLab. Our study shows that CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm and (b) the recovery period (after the last occurrence of a transient fault) is no more than the time it takes to perform a few client (read/write) operations. Our results suggest that the self-stabilizing variation of CAS, which is CASSS, does not significantly impact efficiency while dealing with automatic recovery from transient faults.
AB - Shared memory emulation on distributed message-passing systems can be used as a fault-tolerant and highly available distributed storage solution or as a low-level synchronization primitive. Cadambe et al. proposed the Coded Atomic Storage (CAS) algorithm, which uses erasure coding to achieve data redundancy with much lower communication cost than previous algorithmic solutions. Recently, Dolev et al. introduced a version of CAS where transient faults are included in the fault model, making it self-stabilizing. But self-stabilization comes at a cost, so in this work we examine the overhead of the algorithm by implementing a system we call CASSS (CAS Self-Stabilizing). Our system builds on the self-stabilizing version of CAS, along with several other self-stabilizing building blocks. This provides us with a powerful platform to evaluate the overhead and other aspects of the real-world applicability of the algorithm. In our case-study, we evaluated the system performance by running it on the world-wide distributed platform PlanetLab. Our study shows that CASSS scales very well in terms of the number of servers, the number of concurrent clients, as well as the size of the replicated object. More importantly, it shows (a) to have only a constant overhead compared to the traditional CAS algorithm and (b) the recovery period (after the last occurrence of a transient fault) is no more than the time it takes to perform a few client (read/write) operations. Our results suggest that the self-stabilizing variation of CAS, which is CASSS, does not significantly impact efficiency while dealing with automatic recovery from transient faults.
UR - http://www.scopus.com/inward/record.url?scp=85075609827&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31277-0_9
DO - 10.1007/978-3-030-31277-0_9
M3 - Conference contribution
AN - SCOPUS:85075609827
SN - 9783030312763
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 131
EP - 147
BT - Networked Systems - 7th International Conference, NETYS 2019, Revised Selected Papers
A2 - Atig, Mohamed Faouzi
A2 - Schwarzmann, Alexander A.
PB - Springer
Y2 - 19 June 2019 through 21 June 2019
ER -