Jakob Nylöf: Deep q-learning in continuous time
Master Thesis
Time: Fri 2024-06-14 10.15 - 11.00
Location: KTH 3424 (lunch room)
Respondent: Jakob Nylöf
Supervisor: Boualem Djehiche
Abstract.
Reinforcement Learning (RL) focuses on designing agents that solve sequential decision-making problems by exploring and learning optimal actions through trial-and-error. Traditionally formulated in discrete-time, RL algorithms like Deep Q-learning teach agents the Q-function, by means of function approximation using Deep Neural Networks (DNNs). Recent advancements by X. Y. Zhou and his co-authors propose q-learning, a continuous-time Q-learning framework. In this setting, one focuses on the ”q-function,” the time derivative of the Q-function, which is learned by a martingale approach. This thesis introduces the concept of Deep q-learning, which involves approximating the optimal q-function and optimal value function with DNNs, analogous to the case of Deep Q-learning. We adapt q-learning algorithms from Jia and Zhou (2023) obtaining offline and online Deep q-learning algorithms. Furthermore, we prove that discretization errors associated with q-learning algorithms decrease as time discretization approaches zero, and demonstrate convergence of the offline Deep q-learning algorithm through numerical simulations.