Thumbnail Image

An Efficient Privacy-Preserving Framework for Video Analytics

With the proliferation of video content from surveillance cameras, social media, and live streaming services, the need for efficient video analytics has grown immensely. In recent years, machine learning based computer vision algorithms have shown great success in various video analytic tasks. Specifically, neural network models have dominated in visual tasks such as image and video classification, object recognition, object detection, and object tracking. However, compared with classic computer vision algorithms, machine learning based methods are usually much more compute-intensive. Powerful servers are required by many state-of-the-art machine learning models. With the development of cloud computing infrastructures, people are able to use machine learning techniques everywhere through the Internet. An end user just needs to upload its data to a cloud server and enjoy technical advances in machine learning without owning a power device to perform the corresponding computation. The huge workload is offloaded to cloud servers. There are two major challenges in cloud-based video analytics. First, video analytics requires a huge amount of compute resources, which can be very slow even on powerful servers. It limits the application of neural network based solutions on real-time video analytics. Second, uploading user videos to the cloud reveals private information about users. Existing privacy-preserving inference methods rely heavily on cryptographic operations which are compute (and communication) intensive. In this dissertation, we first address the workload problem of video analytics. Compared with analytic tasks on individual images, nearby frames in a video are usually highly correlated. In other words, there is some information redundancy across video frames. We utilize the redundancy and design a system, PFad, for live video analytics. It adaptively adjusts the video configuration for neural network processing, such as the frame rate and resolution. In this work, we propose to perform configuration adaptation without offline profiling and design a corresponding configuration prediction mechanism. We select configurations with a prediction model based on object movement features. In addition, we reduce the latency through resource orchestration on video analytics servers. The key idea of resource orchestration is to batch inference tasks that use the same CNN model and schedule tasks based on a priority value that estimates their impact on the total latency. We evaluate our system with two video analytic applications, road traffic monitoring and pose detection. The experimental results show that our profiling-free adaptation reduces the workload by 80\% of the state-of-the-art adaptation without lowering the accuracy. The average serving latency is reduced by up to 95\% compared with the profiling-based adaptation. This dissertation addresses the privacy issue in two steps. First, we propose PIPO which protects the privacy of frame-level information. The key idea of PIPO is to accelerate the operations in neural network models by avoiding expensive cryptographic operations as much as possible. In particular, the client preprocesses the inference by performing convolution on a secret share of the input through homomorphic encryption. The user only needs to provide the rest secret shares of the input to the server to perform convolution during the online inference. And it can be done with plaintext operations. In addition, PIPO performs non-linear layers on the client side to protect users' data. To prevent model parameters from being revealed to the client directly, the server performs two reversible operations: multiplying each entry of the convolution results with scale factors and shuffling them. We proved that PIPO ensuring the privacy of users' data with a simulation-based argument. Further, we show that the resources to steal the server's model parameters in PIPO is within the same order of magnitude as the prediction API attack, which is an attack that the client can perform on any inference service where both input and inference results are known to the client. Our experiments on well-known neural network architectures show that PIPO improves the inference latency and communication volume by up to 78x and 26x respectively compared to Delphi. Based on PIPO, this dissertation proposes Pevas, which supports efficient privacy-preserving video analytics. Pevas exploits the causality among consecutive frames for both performance and privacy. We propose a privacy-preserving Differential CNN inference protocol based on PIPO. It only transmits and computes on the change part of each frame. Pevas is not only applying privacy-preserving protocol on the changed parts, but also removes the position of the changed parts. In addition, we design a privacy parameter mechanism for privacy-preserving video analytics. Our experiments of Pevas using ResNet-50 on real-world videos show that it improves the inference latency and communication volume by three to four orders of magnitude than protocols based on Delphi, CrypTFlow, LLAMA, and Cheetah.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos