FID3024 Systems for Scalable Machine Learning 7.5 credits

In the last few years we have been witnessing advances in hardware and software systems that enabled us to train complex Machine Learning (ML) models on massive datasets. To name a few of these hardware and software systems, we can refer to new generation of GPUs, as well as open source frameworks such as Apache Spark, TensorFlow and Ray. Moreover, advances in parallelization, job scheduling, and robustness have empowered us to build complex ML models more efficiently and at scale. In this course we will provide a comprehensive survey of the latest trends in ML systems designs and present different techniques to build such systems. The course covers the main components of ML systems, starting from fundamental concepts of ML to more advanced topics such as parallelization and robustness in designing ML systems. Participants in the course will be required to reflect on the arrangement of different techniques, rules, and guidelines to build ML systems and suggest possible extensions to the technology from their own research domains.

Information per course offering

Choose semester and course offering to see current information and more about the course, such as course syllabus, study period, and application information.

Termin

Autumn 2025

Course offering

Course syllabus as PDF

Please note: all information from the Course syllabus is available on this page in an accessible format.

Course syllabus FID3024 (Autumn 2020–)

Headings with content from the Course syllabus FID3024 (Autumn 2020–) are denoted with an asterisk ( )

Content and learning outcomes

Course disposition

The course consists of six modules, where each module covers a different research area in ML systems. Each module has two sessions: one lecture session and one discussion session. In the lecture session of each module, the teacher will introduce the context and give an overview of the reading material for the week. The students, then have a week to study the topic and go through the reading material. They are also required to submit a detailed review of the selected papers. Then, during the discussion session of each module, they review and discuss the topic and the papers in depth. The goal of this format is to both build a mastery of the material and also to develop a deeper understanding of how to evaluate and review research and hopefully provide insight into how to write better papers, identify open research questions and need for further research.

Course contents

The course covers the following topics in the same order

Fundamental ML, e.g., generalization, back-propagation, etc.
Parallelization, e.g., data-parallel, model-parallel
AutoML, e.g., hyperparameter optimization, meta learning, and Neural Architecture Search (NAS)
Scheduling and optimization, e.g., model compression, gradient compression, etc.
Robust learning, e.g., byzantine-resilient learning
ML platforms, e.g., TensorFlow, Ray, Mllib

Intended learning outcomes

After passing the course, students should be able to:

Demonstrate systematic understanding of ML systems and capacity to scholarly analyze and criticize their components.
Reflect on the ideas and technologies related to ML systems with insight on their possibilities and limitations.
Examine how ML systems are currently used and evaluate how they can be used for new purposes and under different application domains.
Identify the need for further knowledge in improving ML systems.

Literature and preparations

Specific prerequisites

Enrolled as a doctoral student.

Recommended prerequisites

The target students of the course are mainly PhD students of the computer science, information and communication technology, and electrical engineering doctoral programmes, as well as all other PhD students who are interested to know the architecture and fundamentals of modern ML systems. The students should be familiar with the basics of ML, distributed systems, and have a good programming knowledge especially in Python or Scala.

Literature

You can find information about course literature either in the course memo for the course offering or in the course room in Canvas.

Examination and completion

Grading scale

P, F

Examination

EXA1 - Examination, 7.5 credits, grading scale: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

If the course is discontinued, students may request to be examined during the following two academic years.

Other requirements for final grade

The course will be assessed with a Pass/Fail grade, based on active participation in the discussion meetings, as well as a scientifically sound review report in each week. In addition to this, a passing student must attend at least 75% of all lectures and 75% of all student presentation sessions.

Examiner

Amir Hossein Payberah

Ethical approach

All members of a group are responsible for the group's work.
In any assessment, every student shall honestly disclose any help received and sources used.
In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

EECS/Software and Computer Systems

Education cycle

Third cycle

Postgraduate course

Postgraduate courses at EECS/Software and Computer Systems

Studies

Support and guidance

IT and digital services

Contact