Skip to main content
To KTH's start page To KTH's start page

Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery

Time: Wed 2024-06-12 14.00

Location: D3, Lindstedtsvägen 5, Stockholm

Video link:

Language: English

Subject area: Computer Science

Doctoral student: Johan Fredin Haslum , Beräkningsvetenskap och beräkningsteknik (CST)

Opponent: Professor Joakim Lindblad, Uppsala University; Department of Information Technology; Vi3; Image Analysis

Supervisor: Associate Professor Kevin Smith, Science for Life Laboratory, SciLifeLab, Beräkningsvetenskap och beräkningsteknik (CST); Associate Professor Hossein Azizpour, Science for Life Laboratory, SciLifeLab, Robotik, perception och lärande, RPL, SeRC - Swedish e-Science Research Centre; Erik Müllers, AstraZeneca; Karl-Johan Leuchowius, AstraZeneca

Export to calendar

QC 20240520


In the search for new therapeutic treatments, strategies to make the drug discovery process more efficient are crucial. Image-based phenotypic profiling, with its millions of pictures of fluorescent stained cells, is a rich and effective means to capture the morphological effects of potential treatments on living systems. Within this complex data await biological insights and new therapeutic opportunities – but computational tools are needed to unlock them.

This thesis examines the role of machine learning in improving the utility and analysis of phenotypic screening data. It focuses on challenges specific to this domain, such as the lack of reliable labels that are essential for supervised learning, as well as confounding factors present in the data that are often unavoidable due to experimental variability. We explore transfer learning to boost model generalization and robustness, analyzing the impact of domain distance, initialization, dataset size, and architecture on the effectiveness of applying natural domain pre-trained weights to biomedical contexts. Building upon this, we delve into self-supervised pretraining for phenotypic image data, but find its direct application is inadequate in this context as it fails to differentiate between various biological effects. To overcome this, we develop new self-supervised learning strategies designed to enable the network to disregard confounding experimental noise, thus enhancing its ability to discern the impacts of various treatments. We further develop a technique that allows a model trained for phenotypic profiling to be adapted to new, unseen data without the need for any labels or supervised learning. Using this approach, a general phenotypic profiling model can be readily adapted to data from different sites without the need for any labels. Beyond our technical contributions, we also show that bioactive compounds identified using the approaches outlined in this thesis have been subsequently confirmed in biological assays through replication in an industrial setting. Our findings indicate that while phenotypic data and biomedical imaging present complex challenges, machine learning techniques can play a pivotal role in making early drug discovery more efficient and effective.