Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Fault-tolerant aspects of memory systems

Nicholas S Bowen, University of Massachusetts Amherst

Abstract

Memory system design is important for providing high reliability and availability. This dissertation presents a memory architecture to support checkpoints that can improve reliability, and also algorithms to improve recoverable virtual memory. In addition, two novel techniques of reliability analysis are presented that account for program and operating system behavior. Checkpoint and rollback recovery is a method that allows a system to tolerate a failure by periodically saving the state and, if an error occurs, rolling back to the prior checkpoint. A technique is proposed that embeds the support for checkpoint and rollback recovery directly into the virtual memory translation hardware. A system with both highly reliable and normal memory enables recoverable virtual memory by placing modified data in the highly reliable memory and read-only data in normal memory. Hybrid algorithms are proposed for use in systems with multiple classes of physical memory; that is, one virtual memory policy for the highly reliable memory and one for the normal memory. These techniques are analyzed with a trace-driven simulation. Reliability analysis of memories and their relationship to system reliability is an important aspect of system design. The dynamic aspects of the memory are very important. Two aspects studied here are memory usage patterns by a program and the memory allocation by the operating system. A new model is developed for the successful execution of a program taking into account memory reference patterns. This is contrasted against traditional memory reliability calculations showing that the actual reliability may be more optimistic when program behavior is considered. A new theory to explain correlations between increased workloads and increased failure rates is proposed. The tradeoffs in performance and reliability for memory management policies (e.g., virtual or cache memory) are studied as a function of the block-miss reload time. A very small percentage of the memory is found to contribute to a majority of the unreliability. Techniques are proposed to dramatically improve the reliability (i.e., an algorithm called selective scrubbing and the use of very small amounts of highly reliable memory).

Subject Area

Electrical engineering|Computer science|Operations research

Recommended Citation

Bowen, Nicholas S, "Fault-tolerant aspects of memory systems" (1992). Doctoral Dissertations Available from Proquest. AAI9219409.
https://scholarworks.umass.edu/dissertations/AAI9219409

Share

COinS