Robust learning and control of linear dynamical systems
Time: Thu 2020-10-01 10.00
Location: zoom link for online defense (English)
Subject area: Electrical Engineering
Doctoral student: Mina Ferizbegovic , Reglerteknik, ACCESS Linnaeus Centre
Opponent: Associate Professor Florian Dörfler,
Supervisor: Håkan Hjalmarsson, Signaler, sensorer och system, Reglerteknik; Professor Thomas B. Schön, ; Cristian R. Rojas, Reglerteknik
We consider the linear quadratic regulation problem when the plant is an unknown linear dynamical system. We present robust model-based methods based on convex optimization, which minimize the worst-case cost with respect to uncertainty around model estimates. To quantify uncertainty, we derive a methodbased on Bayesian inference, which is directly applicable to robust control synthesis.We focus on control policies that can be iteratively updated after sequentially collecting data. More specifically, we seek to design control policies that balance exploration (reducing model uncertainty) and exploitation (control of the system) when exploration must be safe (robust).First, we derive a robust controller to minimize the worst-case cost, with high probability, given the empirical observation of the system. This robust controller synthesis is then used to derive a robust dual controller, which updates its control policy after collecting data. An episode in which data is collected is called exploration, and the episode using an updated control policy is exploitation. The objective is to minimize the worst-case cost of the updated control policy, requiring that a given exploration budget constrains the worst-case cost during exploration.We look into robust dual control in both finite and infinite horizon settings. The main difference between the finite and infinite horizon settings is that the latter does not consider the length of the exploration and exploitation phase, but it rather approximates the cost using the infinite horizon cost. In the finite horizon setting, we discuss how different exploration lengths affect the trade-off between exploration and exploitation.Additionally, we derive methods that balance exploration and exploitation to minimize the cumulative worst-case cost for a fixed number of episodes. In this thesis, we refer to such a problem as robust reinforcement learning. Essentially, it is a robust dual controller aiming to minimize the cumulative worst-case cost, and that updates its control policy in each episode.Numerical experiments show that the proposed methods have better performance compared to existing state-of-the-art algorithms. Moreover, experiments also indicate that the exploration prioritizes the uncertainty reduction in the parameters that matter most for control.