Title
The Effects of Soft Errors and Mitigation Strategies for Virtualization Servers
Abstract
Virtualized servers compose the majority of cloud computing environments, where these nodes are used to host multiple clients over the same hardware. Many organizations run online applications by hiring elastic computing resources in order to match demand while reducing fixed costs. However, such organizations are unlikely to take advantage of these benefits for critical applications, as it would expose them to several risks. Among other threats, soft errors are a concern in large-scale reliable servers and are expected to become more frequent as a consequence of smaller transistors and lower operating voltages of integrated circuits. This article characterizes virtualized servers of cloud environments in presence of soft errors. Using fault injection, we collect experimental data to determine the failure modes of applications, operating systems, VMs, and hypervisor. The analysis exposes distinct failure modes, ranging from crash failures of a single virtual machine to silent data corruption in permanent storage. The most frequent failure mode, observed in 10–30 percent of injected errors, consists of a hang affecting multiple virtual machines. Given that such failures are a primary cause of downtime, we develop and evaluate a recovery mechanism which uses online testing and recovers a server from all hangs by rebooting its hypervisor.
Year
DOI
Venue
2022
10.1109/TCC.2020.2973146
IEEE Transactions on Cloud Computing
Keywords
DocType
Volume
Virtualization,fault injection,cloud computing,fault tolerance,dependability
Journal
10
Issue
ISSN
Citations 
2
2168-7161
0
PageRank 
References 
Authors
0.34
18
4
Name
Order
Citations
PageRank
Frederico Cerveira102.03
Raul Barbosa291.08
Henrique Madeira31307122.00
Filipe Araujo421424.63