A Model Extraction Attack without Reference Training Data

Zhang, Yubo

Publication

A Model Extraction Attack without Reference Training Data

Zhang, Yubo

Abstract

With the advancement of deep neural networks (DNNs), their application has become increasingly widespread. The security of DNNs is a pressing concern, as they represent valuable intellectual property due to their intricate structure, costly train- ing data, and extensive training process. Typically, DNNs are deployed as black box applications, meaning users can only interact with the model via an application pro- gramming interface to receive outputs. However, even with such restricted access, the security of DNNs is not assured. Adversaries can interrogate these black box mod- els, extracting both inputs and outputs to steal the DNN’s parameters and behavior through model extraction attacks. This approach renders the victim’s DNN training parameters transparent, paving the way for further attacks on the model. To delve deeper into the vulnerabilities of DNN models and offer new protective strategies to developers, we propose a model extraction attack that utilizes noise input instead of solely relying on real images. We use updated random pattern as inputs for the victim model, thereby labeling the noise images for training a surrogate model. Our approach considers efficient sampling rules and a more effective method for learning the weighted loss of images near decision boundaries. This enables us to more pre- cisely replicate the structure of the victim model and enhance the generalizability of our surrogate model. We utilize the weighted loss function to focus on the images which have more number of forgetting events, and we got the best results no mat- ter on the training size or the training time. The performance shows our method can avoid the problem of asking for the reference data or the large number of public dataset, so we can save the computing resource and more fit in the real scenario. Our results indicate that our method on using simple random pattern images, as small number as possible of real images and weighted loss function achieves 94.58% of the victim model’s accuracy with 5.14 hours training time and 58880 images.

Type

Thesis (Open Access)

Date

2024-09

Degree

Master of Science in Electricrical and Computer Engineering (MSECE)

Advisors

Kundu, Sandip

License

Attribution-ShareAlike 4.0 International

cba

License

http://creativecommons.org/licenses/by-sa/4.0/

A Model Extraction Attack without Reference Training Data

Zhang, Yubo

Citations

Abstract

Type

Date

Publisher

Degree

Advisors

License

License

Files

Research Projects

Organizational Units

Journal Issue

Embargo Lift Date

URI

DOI

Publisher Version

Embedded videos

Collections

Related Item(s)