Thi Trang Nguyen: Classifying medical datasets: Frequentist versus Bayesian approaches
Time: Wed 2023-09-20 10.00 - 10.45
Location: Meeting room 9, floor 2, house 1, Albano
Respondent: Thi Trang Nguyen
Supervisor: Taras Bodnar
In this thesis, we discuss five widely spread statistical learning methods for clas- sification tasks including both frequency approaches, such as logistic regression, support vector machines, decision trees, and random forest, as well as Bayesian approaches such as Bayesian logistic regression. The theoretical part begins with exploring the underlying motivations and ideas behind each method. Mathematical models and algorithms are presented to provide a comprehensive understand- ing of their working principles. We then look at the advantages and disadvantages inherent in each method. The practical application segment of our study involved extensive testing on three classification datasets: Breast Cancer Wisconsin (Diag- nosis), Parkinson's Disease and Dermatology. Each method undergoes meticulous evaluation, with resulting models evaluated based on their ability to accurately predict and classify in medical diagnosis if the patient is healthy or diseased. We achieve highly accurate classification results and then we assess which model has the best performance for each data.
Keywords: Supervised Learning, Classification, Logistic Regression, Bayesian Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, Breast Cancer, Parkinson's Disease, Dermatology.