Privacy preserving behaviour learning for the IoT ecosystem
Time: Fri 2021-12-17 14.00
Location: Ka-Sal C (Sven-Olof Öhrvik), Kistagången 16, Kista
Video link: https://kth-se.zoom.us/meeting/register/u5Ysd-qurj4sGdEM-l9Si4c93uwsoh2iKBG8
Subject area: Information and Communication Technology
Doctoral student: Sana Imtiaz , Programvaruteknik och datorsystem, SCS
Opponent: Professor Omer Rana, Cardiff University, United Kingdom
Supervisor: Professor Vladimir Vlassov, Programvaruteknik och datorsystem, SCS; Ramin Sadre, Université catholique de Louvain, Belgium; Sarunas Girdzijauskas, Programvaruteknik och datorsystem, SCS
This work was supported by the Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC) funded by the Education, Audiovisual and Culture Executive Agency (EACEA) of the European Commission under the FPA 2012-0030, and FoFu at KTH.
IoT has enabled the creation of a multitude of personal applications and services for a better understanding and improvement of urban environments and our personal lives. These services are driven by the continuous collection and analysis of sensitive and private user data to provide personalised experiences. Among the different application areas of IoT, smart health care, in particular, necessitates the usage of privacy preservation techniques in order to guarantee protection from user privacy-breaching threats such as identification, profiling, localization and tracking, and information linkage. Traditional privacy preservation techniques such as pseudonymization are no longer sufficient to cater to the requirements of privacy preservation in the fast-growing smart health care domain due to the challenges offered by big data volume, velocity, and variety. On the other hand, there is a number of modern privacy preservation techniques with respective overheads that may have a negative impact on application performance such as reduced accuracy, reduced data utility, and increased device resource usage. There is a need to select appropriate privacy preservation techniques (and solutions) according to the nature of data, system performance requirements, and resource constraints, in order to find proper trade-offs between providing privacy preservation, data utility, and acceptable system performance in terms of accuracy, runtime, and resource consumption.
In this work, we investigate different privacy preservation solutions and measure the impact of introducing our selected privacy preservation solutions on the performance of different components of the IoT ecosystem in terms of data utility and system performance. We implement, illustrate, and evaluate the results of our proposed approaches using real-world and synthetic privacy-preserving smart health care datasets. First, we provide a detailed taxonomy and analysis of the privacy preservation techniques and solutions which may serve as a guideline for selecting appropriate techniques according to the nature of data and system requirements. Next, in order to facilitate privacy preserving data sharing, we present and implement a method for creating realistic synthetic and privacy-preserving smart health care datasets using Generative Adversarial Networks and Differential Privacy. Later, we also present and develop a solution for privacy preserving data analytics, a differential privacy library PyDPLib, with health care data as a use case.
In order to find proper trade-offs between providing necessary privacy preservation, device resource consumption, and application accuracy, we present and implement a novel approach with corresponding algorithms and an end-to-end system pipeline for reconfigurable data privacy in machine learning on resource-limited computing devices. Our evaluation results show that, while providing the required level of privacy, our proposed approach allows us to achieve up to 26.21% memory, 16.67% CPU instructions, and 30.5% of network bandwidth savings as compared to making all the data private. Moreover, we also present and implement an end-to-end solution for privacy-preserving time-series forecasting of user health data streams using Federated Learning and Differential Privacy. Our proposed solution finds a proper trade-off between providing necessary privacy preservation, application accuracy, and runtime, and at best introduces a decrease of ~2% in the prediction accuracy of the trained models.