Thumbnail Image


Caching is a simple yet powerful technique that has had a significant impact on improving the performance of various computer systems. From internet content delivery to CPUs, domain name systems, and database systems, caching has played a pivotal role in making these systems faster and more efficient. The basic idea behind caching is to store frequently accessed data locally, so that future requests for that data can be served more quickly. For example, a Content Delivery Network (CDN) like Akamai deploys thousands of edge caches across the globe, so that end-user requests can be served from a nearby cache, rather than a remote origin server, resulting in a significant reduction in latency. With billions of users accessing the internet every day to download various types of content such as videos, documents, images, web pages, and software, caching has become ubiquitous and critical to the functioning of the internet. In this thesis, we make several contributions to caching research by addressing key challenges related to content caching systems. Specifically, we improve the fault tolerance mechanism of a CDN cache cluster, develop tailored cache algorithms for different content types, and propose synthetic trace generation tools to evaluate cache performance in the absence of production traces. In the first part of this thesis, we identify a limitation in the state-of-the-art fault-tolerance mechanism used in a CDN cache cluster, which relies on object replication. We demonstrate that this approach is both inefficient and ineffective. To address this challenge, we develop a new fault-tolerance mechanism that leverages erasure codes, which we call C2DN (Coded-CDN). C2DN provides a more efficient and effective solution for fault-tolerance in CDN cache clusters. Specifically, we show that C2DN achieves an 11% lower byte miss ratio compared to object replication, demonstrating its superior efficiency. Additionally, C2DN eliminates unavailability-induced miss ratio spikes, resulting in a more effective approach to fault-tolerance. Overall, our results demonstrate the effectiveness of using erasure codes to improve fault-tolerance in CDN cache clusters, and show the advantages of C2DN over the state-of-the-art object replication approach. In the second part of this thesis, we focus on the design and implementation of domain-aware cache algorithms that take into account the unique characteristics of the cached content. Through our experiments, we demonstrate that domain-aware cache algorithms outperform the traditional one-size-fits-all approaches such as LRU and FIFO. Specifically, we introduce two novel domain-aware cache algorithms, namely GRADES and MM-CACHE. GRADES is a gradient descent-based approximate-caching algorithm designed for a feature-based caching system. In a feature-based caching system, the cache can respond back with content that has similar features to the requested content. Additionally, we design and implement cache algorithms for a multimedia delivery system that can perform on-the-fly super-resolution and down-sampling of locally available media content to serve requests more efficiently. In the last part of this thesis, we tackle the challenge of obtaining realistic production traces for cache simulations. While production traces obtained from live internet caching proxies provide valuable insights into user behavior and system performance, they are typically considered private and proprietary, making them difficult to obtain and use in research. To address this challenge, we develop two synthetic trace generation tools: TRAGEN and JEDI. These tools generate synthetic traces that closely mimic the object-level and cache-level properties of the original production traces, making them suitable substitutes for cache simulations. With TRAGEN and JEDI, system designers and researchers can now test and validate new caching algorithms and architectures using realistic synthetic traces, overcoming a major obstacle in caching research. Our results show that the synthetic traces generated by these tools are similar to the original production traces, making them a reliable and practical alternative for cache simulations.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos