# Dual control concepts for linear dynamical systems

**Tid: **
Fr 2022-09-23 kl 10.00

**Plats: **
F3, Lindstedtsvägen 26 & 28, Stockholm

**Videolänk: **
https://kth-se.zoom.us/j/62470041935

**Språk: **
Engelska

**Ämnesområde: **
Elektro- och systemteknik

**Respondent: **
Mina Ferizbegovic
, Reglerteknik

**Opponent: **
Professor Roy Smith,

**Handledare: **
Håkan Hjalmarsson, Reglerteknik; Professor Thomas B. Schön, ; Cristian R. Rojas, Reglerteknik

QC 20220830

## Abstract

We study simultaneous learning and control of linear dynamical systems. In such a setting, control policies are derived with respect to two objectives: i) to control the system as well as possible, given the current knowledge of system dynamics (exploitation), and ii) to gather as much information as possible about the unknown system that can improve control (exploration).These two objectives are often in conflict, and this phenomenon is known as the exploration-exploitation trade-off.More specifically, in the context of simultaneous learning and control, we consider: linear quadratic regulation (LQR) problem, model reference control, and data-driven control based on Willems \textit{et al.}'s fundamental lemma.

First, we consider the LQR problem with unknown dynamics. We present robust and certainty equivalence (CE) model-based control methods that balance exploration and exploitation. We focus on control policies that can be iteratively updated after sequentially collecting data.

We propose robust (with respect to parameter uncertainty) LQR design methods. To quantify uncertainty, we derive a methodbased on Bayesian inference, which is directly applicable to robust control synthesis. To begin, we derive a robust controller to minimize the worst-case cost, with high probability, given the empirical observation of the system. This robust controller synthesis is then used to derive a robust dual controller, which updates its control policy after collecting data. An episode in which data is collected is called exploration, and the episode using an updated control policy called exploitation. The objective is to minimize the worst-case cost of the updated control policy, requiring that a given exploration budget constrains the worst-case cost during exploration. Additionally, we derive methods that balance exploration and exploitation to minimize the cumulative worst-case cost for a fixed number of episodes. In this thesis, we refer to such a problem as robust reinforcement learning. Essentially, it is a robust dual controller aiming to minimize the cumulative worst-case cost, and that updates its control policy in each episode.Numerical experiments show that the proposed methods perform better than existing state-of-the-art algorithms. Moreover, experiments also indicate that the exploration prioritizes the uncertainty reduction in the parameters that matter most for control.

A control policy using the CE principle for LQR consists of a sum of an optimal controller calculated using estimated dynamics at time $t$, and an additive external excitation. It has been shown over the years that the optimal asymptotic rate of regret is in many instances $\mathcal{O}(\sqrt{T})$. In particular, this rate can be obtained by adding a white noise external excitation, with a variance decaying as $\gamma/\sqrt{T}$, where $\gamma$ is a predefined constant. As the amount of excitation is pre-determined, such approaches can be viewed as open-loop control of the external excitation. In this thesis, we approach the problem of designing the external excitation from a feedback perspective leveraging the well-known benefits of feedback control for decreasing sensitivity to external disturbances and system-model mismatch, as compared to open-loop strategies. The benefits of this approach over the open-loop approach can be seen in the case of unmodeled dynamics and disturbances. However, even when using the benefits of feedback control, we do not calculate the optimal amount of external excitation. To find the optimal amount of external excitation, we suggest exploration strategies that are based on a time-dependent scaling $\gamma_t$ and can attain cumulative regret similar to or lower than cumulative regret obtained for optimal scaling $\gamma^*$ according to numerical examples.

Second, we consider the model reference control problem with the goal of proposing a data-driven robust control design method based on an average risk criterion, which we call Bayes control. We show that this approach has very close ties to the Bayesian kernel-based method, but the conceptual difference lies in the use of a deterministic respective stochastic setting for the system parameters.

Finally, we consider data-driven control using Willems \textit{et al.}'s fundamental lemma. First, we propose variations of the fundamental lemma that, instead of a data trajectory, utilize correlation functions in the time domain, as well as power spectra of the input and the output in the frequency domain. Since data-driven control using the fundamental lemma can become a very expensive computation task for large datasets, the proposed variations are easy to computeeven for large datasets and can be efficient as a data compression technique. Second, we study connections of data informativity conditions between the results based on the fundamental lemma (finite time), and classical system identification. We show that finite time informativity conditions for state-space systems are closely linked to the identifiability conditions derived from the fundamental lemma. We prove that the obtained persistency of excitation conditions for infinite time are sufficient conditions for finite time informativity. Moreover, we reveal that the obtained conditions for a finite time in closed-loop are stricter than in classical system identification. This is a consequence of the noiseless data setting in the fundamental lemma that precludes the possibility of noise to excite the system in a feedback setting.