Skip to main content
To KTH's start page

Methods for rapid phylogenetic inference and copy number variation detection from transcriptomics data

Time: Fri 2024-12-20 13.00

Location: F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm

Video link: https://kth-se.zoom.us/j/68990648769

Language: English

Subject area: Computer Science

Doctoral student: Semih Kurt , Beräkningsvetenskap och beräkningsteknik (CST), Science for Life Laboratory, SciLifeLab

Opponent: Associate Professor Mohammed El-Kebir, University of Illinois at Urbana-Champaign

Supervisor: Professor Jens Lagergren, Beräkningsvetenskap och beräkningsteknik (CST), Science for Life Laboratory, SciLifeLab

Export to calendar

QC 20241129

Abstract

Computational biology leverages biological data and mathematical modeling to gain insights into biological systems and their relationships. A key example of widely used biological data is nucleotide sequences, obtained through DNA and RNA sequencing. Recent advances in sequencing technologies make it possible to obtain single-cell level DNA and RNA sequences through rapid, cost-efficient pipelines. This high-resolution data is an opportunity for researchers to investigate complex biological features and processes such as evolutionary relationships, developmental history, somatic mutations, disease progression, and tumor heterogeneity. However, factors like technical noise and inherent biological randomness present challenges in extracting meaningful insights into the aforementioned various biological concepts. Large data sizes associated with single-cell datasets exhibit another obstacle. Therefore, an increasing need for scalable and robust computational methods emerged to fully exploit the recent expansion in both the type and quantity of sequencing data. In this thesis, we address this growing demand for advanced computational methods by proposing novel approaches for two key tasks in computational biology: phylogenetic reconstruction and copy number variation (CNV) inference. 

First, we demonstrate how mixture components in variational autoencoders (VAEs) cooperate, adapting jointly to maximize the evidence lower bound (ELBO), effectively covering the target posterior distribution, and enhancing the latent-representation capabilities, yielding better cell type classification on single-cell transcriptomics datasets. Second, we introduce a VAE-based approach for copy number variation inference from single-cell transcriptomics data. Unlike previous methods, our method does not need cell-type specific gene signatures, tumor-specific markers, or any form of prior information, yet it delivers more accurate estimates of copy number variations. Third, we propose a scalable and rapid method for phylogeny reconstruction using a sparse distance matrix, significantly reducing runtime for large datasets. Fourth, we present a deep learning-based method for simultaneous clonal deconvolution and copy number variation inference from spatial transcriptomics data, offering a detailed view of intra-tumor heterogeneity.

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-356909