Thumbnail Image

ACTION : Adaptive Cache Block Migration in Distributed Cache Architectures

Increasing number of cores in chip multiprocessors (CMP) result in increasing traffic to last-level cache (LLC). Without commensurate increase in LLC bandwidth, such traffic cannot be sustained resulting in loss of performance. Further, as the number of cores increases, it is necessary to scale up the LLC size; otherwise, the LLC miss rate will rise, resulting in a loss of performance. Unfortunately, for a unified LLC with uniform cache access time, access latency increases with cache size, resulting in performance loss. Previously, researchers have proposed partitioning the cache into multiple smaller caches interconnected by a communication network which increases aggregate cache bandwidth but causes non-uniform access latency. Such a cache architecture is called non-uniform cache architecture (NUCA). While NUCA addresses the LLC bandwidth issue, partitioning by itself does not address the access latency problem. Consequently, researchers have previously considered data placement techniques to improve access latency. However, earlier data placement work did not account for the frequency with which specific memory references are accessed. A major reason for that is access frequency for all memory references is difficult to track. In this research, we present a hardware-assisted solution called ACTION (Adaptive Cache Block Migration) to track the access frequency of individual memory references and prioritize their placement closer to the affine core. ACTION mechanism implements cache block migration when there is a detectable change in access frequencies due to a change in the program phase. To keep the hardware overhead low, ACTION counts access references in the LLC stream using a simple and approximate method, and uses simple algorithms for placement and migration. We tested ACTION on a 4-core CMP with a 5x5 mesh LLC network implementing a partitioned D-NUCA against workloads exhibiting distinct asymmetry in cache block access frequency. Our simulation results indicate that ACTION can improve CMP performance by as much as 8% over the state-of-the-art (SOTA) solutions.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos