EL2805 Reinforcement Learning 7.5 credits

Reinforcement Learning (RL) addresses the problem of controlling a dynamical system so as to maximize a notion of reward cumulated over time. At each time (or round), the agent selects an action, and as a result, the system state evolves. The agent observes the new state and collects a reward associated with the state transition, before deciding on the next action. Unlike classical control tasks where typically the system dynamics are completely predictable, RL is concerned with systems whose dynamics have to be learnt or with systems interacting with an uncertain environment. As time evolves, the agent gathers more data, and may improve her knowledge about the system dynamics to make better informed decisions. RL has found numerous applications, ranging from robotics, control, online services and game playing, and has received an increasing attention. Very recently, RL has solved problems in situations approaching real-world complexity, e.g., in learning human-level control for playing video and board games. These situations are however rather specific, and we are still far from systems able to learn in a wide variety of scenarios like humans do.

The course provides an in-depth treatment of the modern theoretical tools used to devise and analyse RL algorithms. It includes an introduction to RL and to its classical algorithms such as Q-learning, and SARSA, but further presents the rationale behind the design of more recent algorithms, such as those striking optimal trade-off between exploration and exploitation. The course also covers algorithms used in recent RL success stories, i.e., deep RL algorithms.

Information per course offering

Choose semester and course offering to see current information and more about the course, such as course syllabus, study period, and application information.

Termin

Autumn 2025Autumn 2026

Information for Autumn 2026 Start 26 Oct 2026 programme students

Course location: KTH Campus
Duration: 26 Oct 2026 - 11 Jan 2027
Periods: Autumn 2026: P2 (7.5 hp)
Pace of study: 50%
Application code: 11070
Form of study: Normal Daytime
Language of instruction: English
Course memo: Course memo is not published
Number of places: Min: 1
Target group: Open to all programmes as long as it can be included in your programme.
Planned modular schedule: [object Object]
Schedule: Schedule is not published
Part of programme: Master's Programme, Systems, Control and Robotics, year 2, RASM
Master's Programme, ICT Innovation, year 2, VCCN
Master's Programme, Systems, Control and Robotics, year 1, RASM
Master's Programme, Systems, Control and Robotics, year 1, LDCS
Master's Programme, Embedded Systems, year 1, INSK
Master's Programme, Systems, Control and Robotics, year 2, LDCS
Master's Programme, Industrial Engineering and Management, year 1, MAIG
Master's Programme, Aerospace Engineering, year 2, SYS
Master's Programme, Cybersecurity, year 1
Master's Programme, Machine Learning, year 2
Master's Programme, Aerospace Engineering, year 2
Master's Programme, Cybersecurity, year 2
Master's Programme, Information and Network Engineering, year 2
Master's Programme, Machine Learning, year 1
Master's Programme, Systems, Control and Robotics, year 2
Master's Programme, Information and Network Engineering, year 1
Master's Programme, Systems, Control and Robotics, year 1

Contact

Examiner

No information inserted

Course coordinator

No information inserted

Teachers

No information inserted

Course syllabus as PDF

Please note: all information from the Course syllabus is available on this page in an accessible format.

Course syllabus EL2805 (Autumn 2023–)

Headings with content from the Course syllabus EL2805 (Autumn 2023–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course gives an in-depth treatment of the modern theoretical tools that are used to design and analyse reinforcement learning algorithms (RL algorithms). It contains an introduction to RL and to its classical algorithms like Q-learning and SARSA, and present furthermore a justification behind the design of the latest algorithms, such as the striking optimal trade-off between exploration and exploitation. The course also covers algorithms that are used in the latest success histories for RL, e.g., deep RL algorithms.

Markov chains, Markov decision process (MDP), dynamic programming and value- and policy iterations, design of approximate controllers for MDP, stochastic linear quadratic control, the Multi-Armed Bandit problem, RL algorithms (Q-learning, Q-learning with function approximation).

Intended learning outcomes

After passing the course, the student should be able to

carefully formulate stochastic control problems as Markov decision-making process problems (MDP), classify equivalent problems and evaluate their traceability
state the principle about optimality in finite time and infinite time horizon for MDP and solve MDP by means of dynamic programming
derive solutions to MDP by using value- and policy iterations
solve control problems for systems whose dynamics must be learnt with Q learning and SARSA algorithms
explain the difference between on-policy and off-policy algorithms
develop and implement RL algorithms with function approximation (for example deep RL algorithms where the Q function is approximated by the output of a neural network)
solve bandit optimisation problems.

Literature and preparations

Specific prerequisites

For non-program students: 120 higher education credits and documented knowledge in English B or an equivalent discipline.

Literature

You can find information about course literature either in the course memo for the course offering or in the course room in Canvas.

Examination and completion

Grading scale

A, B, C, D, E, FX, F

Examination

HEM1 - Homework 1, 1.0 credits, grading scale: P, F
HEM2 - Homework 2, 1.0 credits, grading scale: P, F
LAB2 - Lab 2, 1.0 credits, grading scale: P, F
LAB1 - Lab 1, 1.0 credits, grading scale: P, F
TENA - Written exam, 3.5 credits, grading scale: A, B, C, D, E, FX, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

If the course is discontinued, students may request to be examined during the following two academic years.

Examiner

Alexandre Proutiere

Ethical approach

All members of a group are responsible for the group's work.
In any assessment, every student shall honestly disclose any help received and sources used.
In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

EECS/Intelligent Systems

Main field of study

Electrical Engineering

Education cycle

Second cycle

Supplementary information

https://www.kth.se/student/kurser/kurs/EL2805?l=en.

In this course, the EECS code of honor applies, see:
http://www.kth.se/en/eecs/utbildning/hederskodex.

Studies

Support and guidance

IT and digital services

Contact

EL2805 Reinforcement Learning 7.5 credits

Information per course offering

Information for Autumn 2026 Start 26 Oct 2026 programme students

Contact

Course syllabus as PDF

Content and learning outcomes

Course contents

Intended learning outcomes

Literature and preparations

Specific prerequisites

Literature

Examination and completion

Grading scale

Examination

Examiner

Ethical approach

Further information

Course room in Canvas

Offered by

Main field of study

Education cycle

Supplementary information