Loading...
Efficient Memory Management for Modern Computing Systems
Yang, Hanmei
Yang, Hanmei
Citations
Abstract
Memory management is critical to system efficiency, particularly in data-intensive applications such as high-performance computing and large-scale machine learning. Effective memory management enhances resource utilization, reduces overhead, and ensures scalability across diverse computing environments. However, as modern architectures become more complex and workload scales in size, traditional memory management strategies struggle to adapt to the increasing demands of diverse computing environments. Despite its significance, memory management faces three key challenges. First, maintaining data locality is critical for efficient memory access, but ensuring that frequently accessed data remains in fast memory while minimizing unnecessary data movement remains a persistent challenge. Second, as workloads grow in size and complexity, reducing memory consumption requires tailored optimization strategies that account for execution patterns, memory access behaviors, and computational constraints. Third, heterogeneous hardware architectures often have inefficient resource utilization, as workloads such as LLM training and inference primarily rely on GPUs, leaving CPU resources underutilized. This imbalance results in suboptimal execution and increased energy consumption due to resource wastage. This thesis introduces a series of techniques to enhance memory efficiency across different computing paradigms. From a hardware perspective, we present \NM{}, a NUMA-aware allocator that improves data locality and reduces fragmentation through binding-based memory placement and incremental sharing. From a workload perspective, we leverage 4/6-bit Microscaling (MX) formats for low-precision training, significantly reducing memory consumption while preserving model quality. Bridging hardware and workload adaptability, we propose \OP{} for LLM training and \PL{} for LLM inference, incorporating an offloading-aware execution strategy that dynamically places data and operators across heterogeneous memory resources. With these techniques, we demonstrate that profiling and analyzing both hardware and workload characteristics enable better alignment between memory management strategies and system demands. This alignment not only improves resource utilization but also enhances workload efficiency, ensuring optimal performance across diverse computing environments.
Type
Dissertation (Open Access)
Date
2025-05
Publisher
Degree
Advisors
License
Attribution-NonCommercial-ShareAlike 4.0 International
License
http://creativecommons.org/licenses/by-nc-sa/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
2026-05-16