ID2221 Data-Intensive Computing 7.5 credits

Data-intensiv databehandling

  • Education cycle

    Second cycle
  • Main field of study

    Computer Science and Engineering
  • Grading scale

    A, B, C, D, E, FX, F

Course offerings

Autumn 18 for programme students

Autumn 18 Doktorand for single courses students

  • Periods

    Autumn 18 P1 (7.5 credits)

  • Application code


  • Start date


  • End date


  • Language of instruction


  • Campus

    Campus Kista

  • Tutoring time


  • Form of study


  • Number of places *

    1 - 1

    *) The Course date may be cancelled if number of admitted are less than minimum of places. If there are more applicants than number of places selection will be made.

  • Course responsible

    Amir Payberah <>

    Sarunas Girdzijauskas <>

  • Teacher

    Amir Payberah <>

  • Target group

    For doctoral students at KTH

Intended learning outcomes

The course complements distributed systems courses, with a focus on processing, storing and analyzing massive data. It prepares the students for master projects, and Ph.D. studies in the area of data-intensive computing systems. The main objective of this course is to provide the students with a solid foundation for understanding large scale distributed systems used for storing and processing massive data.

More specifically after the course is completed the student will be able to

  • explain the architecture and properties of the computer systems needed to store, search and index large volumes of data
  • describe the different computational models for processing large data sets for data at rest (batch processing) and data in motion (stream processing)
  • use various computational engines to design and implements nontrivial analytics on massive data
  • explain the different models for scheduling and resource allocation computational tasks on large computing clusters
  • elaborate on the tradeoffs when designing efficient algorithms for processing massive data in a distributed computing setting.

Course main content


  • Distributed file systems
  • No SQL databases
  • Scalable messaging systems
  • Big Data execution engines: Map-Reduce, Spark
  • High level queries and interactive processing: Hive and Spark SQL
  • Stream processing
  • Graph processing
  • Scalable machine learning
  • Resource management.




  • LAB1 - Programming Assignments, 3.0, grading scale: P, F
  • TEN1 - Examination, 4.5, grading scale: A, B, C, D, E, FX, F

Written examination. Laboratory tasks.

Offered by

EECS/Computer Science


Amir Payberah <>


Course syllabus valid from: Spring 2019.
Examination information valid from: Spring 2019.