Markov chains, Markov Decision Process (MDP), Dynamic Programming and value / policy iteration methods, design of approximate controllers for MDPs, stochastic linear quadratic control, Multi-Armed Bandit problems.
EL2800 Stochastic Control and Optimization 7.5 credits
This course has been discontinued.
Last planned examination: Spring 2021
Decision to discontinue this course:
No information insertedContent and learning outcomes
Course contents
Intended learning outcomes
This course introduces basic theories and methodologies for the analysis and the design of stochastic control policies. It provides a comprehensive introduction to stochastic control, with applications taken from a variety of areas including advertising, dynamic resource allocation, and traditional automatic control. After the course, you should be able to:
- Understand basic principles of probability theory and stochastic dynamical systems including Markov chains.
- Rigorously formulate stochastic control problems as Markov Decision Process (MDP) problems, classify the corresponding problems, and evaluate their tractability.
- State the principle of optimality in finite-time and infinite-time horizon MDPs, and solve MDPs using dynamic programming.
- Derive solutions of MDPs using the value and policy iteration methods.
- Propose approximate solutions of MDPs.
- Treat MDP extensions such as constrained MDPs, Partially Observable MDPs, and distributed MDPs.
- State limit theorems expressing the relationship between MDPs and deterministic continuous-time control problems.
- Solve Linear Quadratic stochastic control problems.
- Solve basic optimal stopping time problems.
- Identify and formulate stochastic Multi-Armed Bandit (MAB) problems; derive regret lower bounds for MAB problems.
- Solve basic online stochastic optimization problems.
- Propose algorithms for adversarial MAB problems, and sequential decisions in repeated games.
Literature and preparations
Specific prerequisites
For single course students: 120 credits and documented proficiency in English B or equivalent.
Recommended prerequisites
Equipment
Literature
Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley.
Bertsekas, Dynamic Programming and Optimal Control, vol. 1, Athena Scientific.
Bubeck and Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Now publisher, Foundations and trends in machine learning, 2012
Examination and completion
If the course is discontinued, students may request to be examined during the following two academic years.
Grading scale
Examination
- HEM1 - Homework 1, 1.0 credits, grading scale: P, F
- HEM2 - Homework 2, 1.0 credits, grading scale: P, F
- LAB1 - Lab 1, 1.0 credits, grading scale: P, F
- LAB2 - Lab 2, 1.0 credits, grading scale: P, F
- TEN1 - Exam, 3.5 credits, grading scale: A, B, C, D, E, FX, F
Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.
The examiner may apply another examination format when re-examining individual students.
Other requirements for final grade
· H1: Homework, 1, grade scale: P/F
· LAB1: Computer lab 1, 1, grade scale: P/F
· LAB2: Computer lab 2, 1, grade scale: P/F
· TEN1: Written exam, grade scale: A, B, C, D, E, FX, F
Opportunity to complete the requirements via supplementary examination
Opportunity to raise an approved grade via renewed examination
Examiner
Ethical approach
- All members of a group are responsible for the group's work.
- In any assessment, every student shall honestly disclose any help received and sources used.
- In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.