Title
Recoverable Mutual Exclusion Under System-Wide Failures.
Abstract
Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.
Year
DOI
Venue
2018
10.1145/3212734.3212755
PODC '18: ACM Symposium on Principles of Distributed Computing Egham United Kingdom July, 2018
Keywords
Field
DocType
Mutual exclusion, recovery, fault tolerance, concurrency, shared memory, multi-core algorithms, non-volatile main memory, persistent data structures
Semaphore,Shared memory,Computer science,Upper and lower bounds,Concurrency,Critical section,Theoretical computer science,Model of computation,Time complexity,Mutual exclusion
Conference
ISBN
Citations 
PageRank 
978-1-4503-5795-1
0
0.34
References 
Authors
19
2
Name
Order
Citations
PageRank
Wojciech Golab121017.22
Danny Hendler272644.54