Towards Privacy Preserving Intelligent Systems
Time: Fri 2023-06-02 09.00
Location: E2, Lindstedtsvägen 3, Stockholm
Video link: https://kth-se.zoom.us/j/66441177033
Subject area: Computer Science
Doctoral student: Md Sakib Nizam Khan , Teoretisk datalogi, TCS
Opponent: Professor Vicenc Torra, Umeå University
Supervisor: Professor Sonja Buchegger, Teoretisk datalogi, TCS
Intelligent systems, i.e., digital systems containing smart devices that can gather, analyze, and act in response to the data they collect from their surrounding environment, have progressed from theory to application especially in the last decade, thanks to the recent technological advances in sensors and machine learning. These systems can take decisions on users' behalf dynamically by learning their behavior over time. The number of such smart devices in our surroundings is increasing rapidly. Since these devices in most cases handle privacy-sensitive data, privacy concerns are also increasing at a similar rate. However, privacy research has not been in sync with these developments. Moreover, the systems are heterogeneous in nature (e.g., in terms of form factor, energy, processing power, use case scenarios, etc.) and continuously evolving which makes the privacy problem even more challenging.
In this thesis, we identify open privacy problems of intelligent systems and later propose solutions to some of the most prominent ones. We first investigate privacy concerns in the context of data stored on a single smart device. We identify that ownership change of a smart device can leak privacy-sensitive information stored on the device. To solve this, we propose a framework to enhance the privacy of owners during ownership change of smart devices based on context detection and data encryption. Moving from the single-device setting to more complex systems involving multiple devices, we conduct a systematic literature review and a review of commercial systems to identify the unique privacy concerns of home-based health monitoring systems. From the review, we distill a common architecture covering most commercial and academic systems, including an inventory of what concerns they address, their privacy considerations, and how they handle the data. Based on this, we then identify potential privacy intervention points of such systems.
For the publication of collected data or a machine-learning model trained on such data, we explore the potential of synthetic data as a tool for achieving a better trade-off between privacy and utility compared to traditional privacy-enhancing approaches. We perform a thorough assessment of the utility of synthetic tabular data. Our investigation reveals that none of the commonly used utility metrics for assessing how well synthetic data corresponds to the original data can predict whether for any given univariate or multivariate statistical analysis (when the analysis is not known beforehand) synthetic data achieves utility similar to the original data. For machine learning-based classification tasks, however, the metric Confidence Interval Overlap shows a strong correlation with how similarly the machine learning models (i.e., trained on synthetic vs. original) perform. Concerning privacy, we explore membership inference attacks against machine learning models which aim at finding out whether some (or someone's) particular data was used to train the model. We find from our exploration that training on synthetic data instead of original data can significantly reduce the effectiveness of membership inference attacks. For image data, we propose a novel methodology to quantify, improve, and tune the privacy utility trade-off of the synthetic image data generation process compared to the traditional approaches.
Overall, our exploration in this thesis reveals that there are several open research questions regarding privacy at different phases of the data lifespan of intelligent systems such as privacy-preserving data storage, possible inferences due to data aggregation, and the quantification and improvement of privacy utility trade-off for achieving better utility at an acceptable level of privacy in a data release. The identified privacy concerns and their corresponding solutions presented in this thesis will help the research community to recognize and address remaining privacy concerns in the domain. Solving the concerns will encourage the end-users to adopt the systems and enjoy the benefits without having to worry about privacy.