The course gives an in-depth treatment of the modern theoretical tools that are used to design and analyse reinforcement learning algorithms (RL algorithms). It contains an introduction to RL and to its classical algorithms like Q-learning and SARSA, and present furthermore a justification behind the design of the latest algorithms, such as the striking optimal trade-off between exploration and exploitation. The course also covers algorithms that are used in the latest success histories for RL, e.g., deep RL algorithms.
Markov chains, Markov decision process (MDP), dynamic programming and value- and policy iterations, design of approximate controllers for MDP, stochastic linear quadratic control, the Multi-Armed Bandit problem, RL algorithms (Q-learning, Q-learning with function approximation).