Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection

Jeffrey Alan Clark, University of Massachusetts Amherst

Abstract

This dissertation develops a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modeling. These analytical techniques have several limitations which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modeling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach have been derived and integrated into a generalized software testbed called the REliable Architecture Characterization Tool (REACT). The effectiveness of REACT is demonstrated through the analysis of several alternative fault-tolerant multiprocessor architectures. Specifically, two dependability tradeoffs associated with triple-modular redundant (TMR) systems are investigated. The first explores the reliability-performance tradeoff made by voting unidirectionally, instead of bidirectionally, on either memory read or write accesses. The second examines the reliability-cost tradeoff made by duplicating, rather than triplicating, memory modules and comparing their outputs via error detecting codes. Both studies show that in many cases, acceptably little reliability is sacrificed for potentially large performance increases or cost reductions, in comparison to the original TMR system design.

Subject Area

Electrical engineering|Computer science

Recommended Citation

Clark, Jeffrey Alan, "Dependability analysis of fault-tolerant multiprocessor architectures through simulated fault injection" (1993). Doctoral Dissertations Available from Proquest. AAI9408266.
https://scholarworks.umass.edu/dissertations/AAI9408266

Share

COinS