Course Review: High-throughput data analysis

The course, BB2491 High-throughput data analysis, is the core of MTLS education at KTH. How does BB2491 differ from traditional biology lectures that we have so far? In this blog I will talk a bit about why, and how we should learn and master high-throughput science!

First, why study high-throughput? It is involved in a myriad of industrial applications, as well as the frontiers of research, to name but a few:

  • Next Generation Sequencing (NGS) is the best illustration of generating and analysing high-throughput data
  • High-throuput compound screening dominates lead discovery for manufacturing of small molecule drug
  • Traditional gene-targeting methods are sufficient for analysis of mendelian diseases, but diseases that involves more complex interplay, requires collecting and analysing “bigger data”

In BB2491, the teaching follows a logical transition from theory, practice to a hand-on project.


In biology, high throughput analysis can be split into three parts, namely Genomics, Transcriptomics, and Proteomics. Correspondingly, we have three professors responsible for each part:


Lukas Käll                                Lars Arvestad                               Olof Emanuelsson

In contrast to traditional biology lectures, we have no designated text book; alternatively, we have 33 research papers or scientific reviews as mandatory reading materials! It sounds a bit daunting in the beginning, but under the careful guidance of teachers, as well as  fundamental building blocks in previous course (Genomics, Proteomics, Bioinformatics), we are able to dive into the ocean of knowledge!

For example, while Olof introduced the basic concept of RPKM in abundance estimation at the start of the transcriptomics part, it is concluded by four excellent students at the edge of RNA sequencing techniques.


There are 4 computer labs throughout the course. The first purpose is to consolidate the central concepts acquired in the lectures, in particular those abstract statistical concepts that can pose difficulty to biology-background students. For instance, in computer lab 1 we examine the multiple hypothesis correction, which is critical for statistical comparison when you have a large number of samples, with our own hands.

A slide from the statistics lecture

The second purpose of computer labs is to familiarise ourselves with the working environment of bioinformaticians. Instead of working with the local computers plus graphical interface, we were asked to operate in a cluster with command lines. In those exercises, it is the first time that we come across UPPMAX (now some of us are doing the master project on it), and play around with the common bioinformatic algorithm for alignment, assembly etc.


The project, which is the core of BB2491, aims at testing the competency of individual student; but it turns out to become a festival where our creativity explodes!

First, we were divided into groups of 4-5 and each group was assigned an unique topic in which high-throughput analysis techniques are required. The projects vary a lot (I just named 4 below), but they have one thing in common: all of them deal with authentic data generated from high-throughput experiments (RNAseq, microarray, mass spectrometry etc).

CHIP-Seq analysis of ERα and ERβ chromatin

Spruce (Christmas tree) transcriptome assessment

Post-translational modification (PTM) identification for Cryo-EM

Cyanobacterial proteomics in response to light

The project lasts for 5 weeks, and each group got assigned a tutor, who is usually a post-doc working with related group for guidance. I am lucky to have three reliable teammates, Elinor, Ryno and Tanya and a super helpful and patient post-doc, Vital.

Depends on the topics, groups applies different analysis techniques and softwares, and visualise the results with scripts on R on Python. After nearly a month of reading, data mining and coding, 10th January is the show day!

Poster gallery of our class:


All course activities are mandatory, but they could be either graded (A-F) or non-graded (P/F).


  1. Poster: teachers will evaluate the quality of the poster and listen to the students’ presentation as a measure of satisfactory understanding and applications of  high-throughput sciences
  2. Project diary: students keep a record of their contribution to the project in terms of an on line diary (here is an example of my own project diary, if you are interested)


  1. Importu individual presentation of one high-throughput technique, once per term
  2. Duo-presentation of a high-throughput research paper (here is the slide made by me and my groupmate Mathius about quantification)
  3. Four reports after completion of computer lab
  4. Attendance of 5/6 lectures


Hope that you enjoy reading this blog! You can also find my previous blogs about courses at MTLS:

Biophysical Chemistry | Introduction to Bioinformatics | Frontiers in Translational Life Sciences | Applied Communication in Life Science

And course reviews written by Carolina, our ambassador at Karolinska:

Genetics | Project in Molecular Life Science