Learning from Interactions
Forward and Inverse Decision-Making for Autonomous Dynamical Systems
Time: Thu 2023-11-23 10.00
Location: Kollegiesalen, Brinellvägen 8, Stockholm
Video link: https://kth-se.zoom.us/j/62028765716
Subject area: Electrical Engineering
Doctoral student: Inês de Miranda de Matos Lourenço , Reglerteknik
Opponent: Professor Sandra Hirche, Technical University of Munich, Munich, Germany
Supervisor: Professor Bo Wahlberg, Reglerteknik
Decision-making is the mechanism of using available information to generate solutions to given problems by forming preferences, beliefs, and selecting courses of action amongst several alternatives. In this thesis, we study the mechanisms that generate behavior (the forward problem) and how their characteristics can explain observed behavior (the inverse problem). Both problems play a pivotal role in contemporary research due to the desire to design sophisticated autonomous agents that serve as the building blocks for a smart society, amidst complexity, risk, and uncertainty. This work explores different parts of the autonomous decision-making process where agents learn from interacting with each other and the environment that surrounds them. We address fundamental problems of behavior modeling, parameter estimation in the form of beliefs, distributions, and reward functions, and then finally interactions with other agents; which lay the foundation for a complete and integrative framework for decision-making and learning. The thesis is divided into four parts, each featuring a different information exchange paradigm.
First, we model the forward problem of how a decision-maker forms beliefs about the world and the inverse problem of estimating these beliefs from the agent’s behavior. The private belief (posterior distribution) on the state of the world is formed according to a hidden Markov model by filtering private information. The ability to estimate private beliefs forms a foundation for predicting and counteracting against future actions. We answer the problems of i) how the private belief of the decision-maker can be estimated by observing its decisions (under two different scenarios), and ii) how the decision-maker can protect its private belief from an adversary by confusing it. We exemplify the applicability of our frameworks in regime-switching Markovian portfolio allocation.
In the second part, we study forward decision-making of biological systems and the inverse problem of how to obtain insight into their intrinsic characteristics. We focus on time perception – how humans and animals perceive the passage of time – and design a biologically-inspired decision-making framework using reinforcement learning that replicates timing mechanisms. We show that a simulated robot equipped with our framework is able to perceive time similarly to animals, and that by analyzing its performed actions we are able to estimate the parameters of timing mechanisms.
Next, we consider teacher-student settings where a teacher agent can intervene with the decision-making process of a student agent to assist it in performing a task. In the third part, we propose correctional learning as an approach where the teacher can intercept the observations the student collects from the system and modify them to improve the estimation process of the student. We provide finite-sample results for batch correctional learning in system identification, generalize it to more complex systems using optimal transport, and lower-bound improvements on the estimate’s variance for the online case.
Decision-making in teacher-student settings like the previous one requires both agents to have aligned models of understanding of each other. In the fourth and last part of this thesis, the teacher can, instead, alter the decisions of the decision-maker in a human-robot interaction setting. We use a confidence-based misalignment detection method that enables the robot to update its knowledge proportionally to its confidence in the human corrections and propose a framework to disambiguate between misalignment caused by incorrectly learned features that do not generalize to new environments and features entirely missing from the robot’s model. We demonstrate the proposed framework in a 7 degrees-of-freedom robot manipulator with physical human corrections and show how to initiate the model realignment process once misalignment is detected.