Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.


Access Type

Open Access Thesis

Document Type


Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)

Year Degree Awarded


Month Degree Awarded



This thesis addresses the crucial issue of deploying large Transformer models on resource-constrained edge devices. Given the slow training speeds, the current Transformer pruning and fine-tuning process becomes tedious and time-consuming for multiple pruning configurations. To remedy this, the research proposes a novel composability-based Transformer pruning framework, aiming to significantly reduce the time required for fine-tuning pruned models across various configurations, while maintaining model performance. Unlike traditional approaches, this study explores the composability between Transformer pruning configurations, unveiling opportunities for computational reuse. It leverages this composability within a newly proposed framework, employing techniques similar to knowledge distillation and automating the process of pruning and fine-tuning. The framework demonstrated its ability to speed up the response time needed to fine-tune a model based on the given pruning configuration, making it a practical tool for real-world deployment on edge devices. The outcome of this research is a novel method that opens up a fresh perspective on Transformer model compression, offering a reference for future studies on pruning and fine-tuning of Transformer networks.


First Advisor

Tongping Liu

Second Advisor

Hui Guan

Third Advisor

Sandip Kundu