Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

ORCID

https://orcid.org/0000-0002-3600-9432

Access Type

Open Access Thesis

Document Type

thesis

Embargo Period

3-1-2022

Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)

Year Degree Awarded

2021

Month Degree Awarded

September

Abstract

Increasing number of cores in chip multiprocessors (CMP) result in increasing traffic to last-level cache (LLC). Without commensurate increase in LLC bandwidth, such traffic cannot be sustained resulting in loss of performance. Further, as the number of cores increases, it is necessary to scale up the LLC size; otherwise, the LLC miss rate will rise, resulting in a loss of performance. Unfortunately, for a unified LLC with uniform cache access time, access latency increases with cache size, resulting in performance loss. Previously, researchers have proposed partitioning the cache into multiple smaller caches interconnected by a communication network which increases aggregate cache bandwidth but causes non-uniform access latency. Such a cache architecture is called non-uniform cache architecture (NUCA). While NUCA addresses the LLC bandwidth issue, partitioning by itself does not address the access latency problem. Consequently, researchers have previously considered data placement techniques to improve access latency. However, earlier data placement

work did not account for the frequency with which specific memory references are accessed. A major reason for that is access frequency for all memory references is difficult to track. In this research, we present a hardware-assisted solution called ACTION (Adaptive Cache Block Migration) to track the access frequency of individual memory references and prioritize their placement closer to the affine core. ACTION mechanism implements cache block migration when there is a detectable change in access frequencies due to a change in the program phase. To keep the hardware overhead low, ACTION counts access references in the LLC stream using a simple and approximate method, and uses simple algorithms for placement and migration. We tested ACTION on a 4-core CMP with a 5x5 mesh LLC network implementing a partitioned D-NUCA against workloads exhibiting distinct asymmetry in cache block access frequency. Our simulation results indicate that ACTION can improve CMP performance by as much as 8% over the state-of-the-art (SOTA) solutions.

DOI

https://doi.org/10.7275/24465367.0

First Advisor

Sandip Kundu

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Available for download on Tuesday, March 01, 2022

Share

COinS