Skip to main content

EL2805 Reinforcement Learning 7.5 credits

Reinforcement Learning (RL) addresses the problem of controlling a dynamical system so as to maximize a notion of reward cumulated over time. At each time (or round), the agent selects an action, and as a result, the system state evolves. The agent observes the new state and collects a reward associated with the state transition, before deciding on the next action. Unlike classical control tasks where typically the system dynamics are completely predictable, RL is concerned with systems whose dynamics have to be learnt or with systems interacting with an uncertain environment. As time evolves, the agent gathers more data, and may improve her knowledge about the system dynamics to make better informed decisions. RL has found numerous applications, ranging from robotics, control, online services and game playing, and has received an increasing attention. Very recently, RL has solved problems in situations approaching real-world complexity, e.g., in learning human-level control for playing video and board games. These situations are however rather specific, and we are still far from systems able to learn in a wide variety of scenarios like humans do.

The course provides an in-depth treatment of the modern theoretical tools used to devise and analyse RL algorithms. It includes an introduction to RL and to its classical algorithms such as Q-learning, and SARSA, but further presents the rationale behind the design of more recent algorithms, such as those striking optimal trade-off between exploration and exploitation. The course also covers algorithms used in recent RL success stories, i.e., deep RL algorithms.

Choose semester and course offering

Choose semester and course offering to see current information and more about the course, such as course syllabus, study period, and application information.


For course offering

Autumn 2024 Start 28 Oct 2024 programme students

Application code


Headings with content from the Course syllabus EL2805 (Autumn 2023–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course gives an in-depth treatment of the modern theoretical tools that are used to design and analyse reinforcement learning algorithms (RL algorithms). It contains an introduction to RL and to its classical algorithms like Q-learning and SARSA, and present furthermore a justification behind the design of the latest algorithms, such as the striking optimal trade-off between exploration and exploitation. The course also covers algorithms that are used in the latest success histories for RL, e.g., deep RL algorithms.

Markov chains, Markov decision process (MDP), dynamic programming and value- and policy iterations, design of approximate controllers for MDP, stochastic linear quadratic control, the Multi-Armed Bandit problem, RL algorithms (Q-learning, Q-learning with function approximation).

Intended learning outcomes

After passing the course, the student should be able to

  • carefully formulate stochastic control problems as Markov decision-making process problems (MDP), classify equivalent problems and evaluate their traceability
  • state the principle about optimality in finite time and infinite time horizon for MDP and solve MDP by means of dynamic programming
  • derive solutions to MDP by using value- and policy iterations
  • solve control problems for systems whose dynamics must be learnt with Q learning and SARSA algorithms
  • explain the difference between on-policy and off-policy algorithms
  • develop and implement RL algorithms with function approximation (for example deep RL algorithms where the Q function is approximated by the output of a neural network)
  • solve bandit optimisation problems.

Literature and preparations

Specific prerequisites

For non-program students: 120 higher education credits and documented knowledge in English B or an equivalent discipline.

Recommended prerequisites

No information inserted


No information inserted


No information inserted

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

A, B, C, D, E, FX, F


  • HEM1 - Homework 1, 1.0 credits, grading scale: P, F
  • HEM2 - Homework 2, 1.0 credits, grading scale: P, F
  • LAB1 - Lab 1, 1.0 credits, grading scale: P, F
  • LAB2 - Lab 2, 1.0 credits, grading scale: P, F
  • TENA - Written exam, 3.5 credits, grading scale: A, B, C, D, E, FX, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted


Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

Main field of study

Electrical Engineering

Education cycle

Second cycle

Add-on studies

No information inserted


Alexandre Proutiere (

Supplementary information

In this course, the EECS code of honor applies, see: