Loading...
Thumbnail Image
Publication

A Model Extraction Attack without Reference Training Data

Citations
Altmetric:
Abstract
With the advancement of deep neural networks (DNNs), their application has become increasingly widespread. The security of DNNs is a pressing concern, as they represent valuable intellectual property due to their intricate structure, costly train- ing data, and extensive training process. Typically, DNNs are deployed as black box applications, meaning users can only interact with the model via an application pro- gramming interface to receive outputs. However, even with such restricted access, the security of DNNs is not assured. Adversaries can interrogate these black box mod- els, extracting both inputs and outputs to steal the DNN’s parameters and behavior through model extraction attacks. This approach renders the victim’s DNN training parameters transparent, paving the way for further attacks on the model. To delve deeper into the vulnerabilities of DNN models and offer new protective strategies to developers, we propose a model extraction attack that utilizes noise input instead of solely relying on real images. We use updated random pattern as inputs for the victim model, thereby labeling the noise images for training a surrogate model. Our approach considers efficient sampling rules and a more effective method for learning the weighted loss of images near decision boundaries. This enables us to more pre- cisely replicate the structure of the victim model and enhance the generalizability of our surrogate model. We utilize the weighted loss function to focus on the images which have more number of forgetting events, and we got the best results no mat- ter on the training size or the training time. The performance shows our method can avoid the problem of asking for the reference data or the large number of public dataset, so we can save the computing resource and more fit in the real scenario. Our results indicate that our method on using simple random pattern images, as small number as possible of real images and weighted loss function achieves 94.58% of the victim model’s accuracy with 5.14 hours training time and 58880 images.
Type
Thesis (Open Access)
Date
2024-09
Publisher
License
Attribution-ShareAlike 4.0 International
License
http://creativecommons.org/licenses/by-sa/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo Lift Date
Publisher Version
Embedded videos
Collections
Related Item(s)