Loading...
Thumbnail Image
Publication

A Composability-Based Transformer Pruning Framework

Citations
Altmetric:
Abstract
This thesis addresses the crucial issue of deploying large Transformer models on resource-constrained edge devices. Given the slow training speeds, the current Transformer pruning and fine-tuning process becomes tedious and time-consuming for multiple pruning configurations. To remedy this, the research proposes a novel composability-based Transformer pruning framework, aiming to significantly reduce the time required for fine-tuning pruned models across various configurations, while maintaining model performance. Unlike traditional approaches, this study explores the composability between Transformer pruning configurations, unveiling opportunities for computational reuse. It leverages this composability within a newly proposed framework, employing techniques similar to knowledge distillation and automating the process of pruning and fine-tuning. The framework demonstrated its ability to speed up the response time needed to fine-tune a model based on the given pruning configuration, making it a practical tool for real-world deployment on edge devices. The outcome of this research is a novel method that opens up a fresh perspective on Transformer model compression, offering a reference for future studies on pruning and fine-tuning of Transformer networks.
Type
Thesis (Open Access)
Date
2023-09
Publisher
License
License
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
Publisher Version
Embedded videos
Related Item(s)