Learning and Control Strategies for Cyber-physical Systems: From Wireless Control over Deep Reinforcement Learning to Causal Identification
Time: Wed 2020-12-09 16.00
Location: zoom link for online defence (English)
Subject area: Electrical Engineering
Doctoral student: Dominik Baumann , Reglerteknik, Max Planck Institute for Intelligent Systems, Division of Decision and Control Systems
Opponent: Professor Paulo Tabuada, University of California, Los Angeles
Supervisor: Karl H. Johansson, Signaler, sensorer och system, Reglerteknik, ACCESS Linnaeus Centre; Professor Sebastian Trimpe, RWTH Aachen University
Cyber-physical systems (CPS) integrate physical processes with computing and communication to autonomously interact with the environment. This enables emerging applications such as autonomous driving or smart factories. However, current technology does not provide the dependability and adaptability to realize those applications. CPS are systems with complex dynamics that need to be adaptive, communicate with each other over wireless channels, and provide theoretical guarantees on proper functioning. In this thesis, we take on the challenges imposed by wireless CPS by developing appropriate learning and control strategies.
In the first part of the thesis, we present a holistic approach that enables provably stable feedback control over wireless networks. At design time (i.e., prior to execution), we tame typical imperfections inherent in wireless networks, such as communication delays and message loss. The remaining imperfections are then accounted for through feedback control. At run time (i.e., during execution), we let systems reason about communication demands and allocate communication resources accordingly. We provide theoretical stability guarantees and evaluate the approach on a cyber-physical testbed, featuring a multi-hop wireless network supporting multiple cart-pole systems.
In the second part, we enhance the flexibility of our designs through learning. We first propose a framework based on deep reinforcement learning to jointly learn control and communication strategies for wireless CPS by integrating both objectives, control performance and saving communication resources, in the reward function. This enables learning of resource-aware controllers for nonlinear and high-dimensional systems. Second, we propose an approach for evaluating the performance of models of wireless CPS through online statistical analysis. We trigger learning in case performance drops, that way limiting the number of learning experiments and reducing computational complexity. Third, we propose an algorithm for identifying the causal structure of control systems. We provide theoretical guarantees on learning the true causal structure and demonstrate enhanced generalization capabilities inherited through causal structure identification on a real robotic system.