EL2805 Reinforcement Learning 7.5 credits

Reinforcement Learning (RL) addresses the problem of controlling a dynamical system so as to maximize a notion of reward cumulated over time. At each time (or round), the agent selects an action, and as a result, the system state evolves. The agent observes the new state and collects a reward associated with the state transition, before deciding on the next action. Unlike classical control tasks where typically the system dynamics are completely predictable, RL is concerned with systems whose dynamics have to be learnt or with systems interacting with an uncertain environment. As time evolves, the agent gathers more data, and may improve her knowledge about the system dynamics to make better informed decisions. RL has found numerous applications, ranging from robotics, control, online services and game playing, and has received an increasing attention. Very recently, RL has solved problems in situations approaching real-world complexity, e.g., in learning human-level control for playing video and board games. These situations are however rather specific, and we are still far from systems able to learn in a wide variety of scenarios like humans do.

The course provides an in-depth treatment of the modern theoretical tools used to devise and analyse RL algorithms. It includes an introduction to RL and to its classical algorithms such as Q-learning, and SARSA, but further presents the rationale behind the design of more recent algorithms, such as those striking optimal trade-off between exploration and exploitation. The course also covers algorithms used in recent RL success stories, i.e., deep RL algorithms.

About course offering

For course offering

Autumn 2024 Start 28 Oct 2024 programme students

Target group

Open to all programmes as long as it can be included in your programme.

Part of programme

Master's Programme, Aerospace Engineering, åk 2, Optional

Master's Programme, Aerospace Engineering, åk 2, SYS, Optional

Master's Programme, Cybersecurity, åk 1, Recommended

Master's Programme, Cybersecurity, åk 2, Recommended

Master's Programme, Embedded Systems, åk 1, INMV, Recommended

Master's Programme, Embedded Systems, åk 1, INPF, Recommended

Master's Programme, ICT Innovation, åk 2, VCCN, Recommended

Master's Programme, Industrial Engineering and Management, åk 1, MAIG, Conditionally Elective

Master's Programme, Information and Network Engineering, åk 1, Recommended

Master's Programme, Information and Network Engineering, åk 2, Recommended

Master's Programme, Machine Learning, åk 1, Conditionally Elective

Master's Programme, Machine Learning, åk 2, Conditionally Elective

Master's Programme, Mechatronics, åk 1, Conditionally Elective

Master's Programme, Mechatronics, åk 2, Conditionally Elective

Master's Programme, Systems, Control and Robotics, åk 1, Recommended

Master's Programme, Systems, Control and Robotics, åk 1, LDCS, Conditionally Elective

Master's Programme, Systems, Control and Robotics, åk 1, RASM, Conditionally Elective

Master's Programme, Systems, Control and Robotics, åk 2, Recommended

Master's Programme, Systems, Control and Robotics, åk 2, LDCS, Conditionally Elective

Master's Programme, Systems, Control and Robotics, åk 2, RASM, Conditionally Elective

Periods

P2 (7.5 hp)

Duration

28 Oct 2024

13 Jan 2025

Pace of study

50%

Form of study

Normal Daytime

Language of instruction

English

Course location

KTH Campus

Number of places

Places are not limited

Planned modular schedule

Course memo

Course memo is not published

Schedule

Link to schedule

Application

For course offering

Autumn 2024 Start 28 Oct 2024 programme students

Application code

50534

Contact

For course offering

Autumn 2024 Start 28 Oct 2024 programme students

Contact

Alexandre Proutiere (alepro@kth.se)

Examiner

No information inserted

Course coordinator

No information inserted

Teachers

No information inserted

Headings with content from the Course syllabus EL2805 (Autumn 2023–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course gives an in-depth treatment of the modern theoretical tools that are used to design and analyse reinforcement learning algorithms (RL algorithms). It contains an introduction to RL and to its classical algorithms like Q-learning and SARSA, and present furthermore a justification behind the design of the latest algorithms, such as the striking optimal trade-off between exploration and exploitation. The course also covers algorithms that are used in the latest success histories for RL, e.g., deep RL algorithms.

Markov chains, Markov decision process (MDP), dynamic programming and value- and policy iterations, design of approximate controllers for MDP, stochastic linear quadratic control, the Multi-Armed Bandit problem, RL algorithms (Q-learning, Q-learning with function approximation).

Intended learning outcomes

After passing the course, the student should be able to

carefully formulate stochastic control problems as Markov decision-making process problems (MDP), classify equivalent problems and evaluate their traceability
state the principle about optimality in finite time and infinite time horizon for MDP and solve MDP by means of dynamic programming
derive solutions to MDP by using value- and policy iterations
solve control problems for systems whose dynamics must be learnt with Q learning and SARSA algorithms
explain the difference between on-policy and off-policy algorithms
develop and implement RL algorithms with function approximation (for example deep RL algorithms where the Q function is approximated by the output of a neural network)
solve bandit optimisation problems.

Literature and preparations

Specific prerequisites

For non-program students: 120 higher education credits and documented knowledge in English B or an equivalent discipline.

Recommended prerequisites

No information inserted

Equipment

No information inserted

Literature

No information inserted

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

A, B, C, D, E, FX, F

Examination

HEM1 - Homework 1, 1.0 credits, grading scale: P, F
HEM2 - Homework 2, 1.0 credits, grading scale: P, F
LAB1 - Lab 1, 1.0 credits, grading scale: P, F
LAB2 - Lab 2, 1.0 credits, grading scale: P, F
TENA - Written exam, 3.5 credits, grading scale: A, B, C, D, E, FX, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted

Examiner

Alexandre Proutiere

Ethical approach

All members of a group are responsible for the group's work.
In any assessment, every student shall honestly disclose any help received and sources used.
In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

EECS/Intelligent Systems

Main field of study

Electrical Engineering

Education cycle

Second cycle

Add-on studies

No information inserted

Contact

Alexandre Proutiere (alepro@kth.se)

Supplementary information

https://www.kth.se/student/kurser/kurs/EL2805?l=en.

In this course, the EECS code of honor applies, see:
http://www.kth.se/en/eecs/utbildning/hederskodex.