Novel likelihood-based inference techniques for sequential data with medical and biological applications
Tid: Fr 2022-05-13 kl 13.00
Plats: Ka-Sal C (Sven-Olof Öhrvik), Kistagången 16, Kista
Respondent: Negar Safinianaini , Programvaruteknik och datorsystem, SCS, KTH - Electrical Engineering and Computer Science School
Opponent: Morris Quaid,
Handledare: Henrik Boström, Programvaruteknik och datorsystem, SCS
The probabilistic approach is crucial in modern machine learning, as it provides transparency and quantification of uncertainty. This thesis is concerned with the probabilistic building blocks, i.e., probabilistic graphical models (PGM) followed by application of standard deterministic approximate inference, i.e., Expectation-Maximization (EM) and Variational Inference (VI). The contribution regards improvements on the parameter learning of EM, most importantly, novel probabilistic models, and new VI methodology for phylogenetic inference. Firstly, this thesis improves upon the vanilla EM algorithm for hidden Markov models (HMM) and mixtures of HMMs (MHMM). The proposed constrained EM algorithm for HMMs compensates for the lack of long-range context in HMMs. The two other proposed novel regularized EM algorithms provide better local optima for parameter learning of MHMMs, particularly in cancer analysis. The novel EMs are merely modifications of the standard EM algorithm that do not add any extra complexity, unlike other modifications targeting the context and poor local optima issues. Secondly, this thesis introduces one novel and one augmented PGMs together with the VI frameworks for robust and fast Bayesian inference. The first method, CopyMix, uses a single-phase framework to simultaneously provide clonal decomposition and copy number pro- filing of single-cell cancer data. So, in contrast to previous approaches, it does not achieve the two objectives in a sequential and ad-hoc fashion, which is prune to introduce artifacts and errors. The second method provides an augmented PGM with a faster framework for phylogenetic inference; specifically, a novel natural gradient-based VI algorithm is devised. Regarding the cancer analysis, this thesis concludes that CopyMix is superior to MH- MMs, despite that the two novel EM algorithms proposed in this thesis partially improve the performance of clonal tumor decomposition. The empirical support presented throughout this thesis confirms that the proposed likelihood-based methods and optimization tools provide opportunities for better analysis algorithms, particularly suited for cancer research.