# Statistical Learning, Dynamics and Control

Fast Rates and Fundamental Limits for Square Loss

**Tid: **
Fr 2022-11-11 kl 15.00

**Plats: **
Kollegiesalen, Brinellvägen 8, Stockholm

**Videolänk: **
https://kth-se.zoom.us/j/63439289698

**Språk: **
Engelska

**Ämnesområde: **
Elektro- och systemteknik

**Respondent: **
Ingvar Ziemann
, Reglerteknik

**Opponent: **
Professor Raginsky Maxim, University of Illinois at Urbana-Champaign, IL, USA

**Handledare: **
Professor Henrik Sandberg, Reglerteknik

QC 20221019

## Abstract

Learning algorithms play an ever increasing role in modern engineering solutions. However, despite many recent successes, their performance in the context of dynamical and control systems is not exactly well-understood. This becomes especially pertinent as we move toward deploying these algorithms in safety-critical systems such as our power grids or smart mobility systems. These shortcomings motivate the present thesis. Broadly, by blending tools from control theory, information theory and statistical learning theory, this thesis seeks to further our understanding of when learning in the context of (controlled) dynamical systems is easy or hard. That is, we seek to quantify those properties which are key, and those which are not, for learning to succeed in this context.

In the first part of the thesis, we study learning with square loss (regression) in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk boundwhich shows that whenever a\emph{trajectory hypercontractivity} condition holds, the risk of the least-squares estimator on dependent data matches the \iid\ rate order-wise after a burn-in time.In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. However, recent work on linear dynamical systems and generalized linear models has shown that this is deflation is not always necessary. The main contribution of the first part of the thesis is to extend this observation to a much wider class of examples. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We present several examples for when this occurs: bounded function classes forwhich the $L^2$ and $L^{2+\e}$ norms are equivalent, ergodic finite state Markov chains, various parametric models including linear dynamical systems and generalized linear models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$-ellipsoids.

In the second part of the thesis, we study fundamental limits to learning-based solutions of the linear quadratic Gaussian (LQG) problem. The majority of this part is devoted to the online (or adaptive) control problem, in which a learner (or control engineer) seeks to control an unknown linear system subject to a quadratic (square) loss function with minimal regret (cumulative suboptimality). %To solve this problem, the learner must solve a trade-off between exploration and exploitation. Put differently,The learner faces the dual objectives of controlling the system to the best of their present knowledge while simultaneously ensuring that enough information about the system is gathered. We prove fundamental limits - regret lower bounds - to this problem by relating the regret of any policy to an associated Fisher information and applying the Van Trees' Inequality (Bayesian Cramér-Rao). We instantiate our bounds for state-feedback and partially observed systems. We show that ill-conditioned systems - in terms of their controllability and observability structure - lead to larger regret. In other words, classical control-theoretic limitations carry over to the learning-based setting. We further show that systems as simple as integrators lead to \emph{exponential} in the problem dimension regret lower bounds. Finally, we also study the variance of policy gradient methods. We derive results similar in spirit to those for the online LQG problem: ill-conditioned systems, as per poor controllability or observability, yield noisier gradients.