Till innehåll på sidan

Fanny Bergström: When graph theory meets unsupervised learning: the statistical properties of ‘spectral clustering’

Masters Thesis

Tid: To 2020-06-11 kl 09.45

Plats: Zoom, meeting ID: 901 405 086

Medverkande: Fanny Bergström

Ämnesområde: Mathematical Statistics

Handledare: Chun-Biu Li

Exportera till kalender

Abstract

Spectral clustering treats clustering as a graph partitioning problem, where clusters are constructed based on the commute time distance (CTD) between the nodes of the graph. The CTD combines the local connections between nearby points to establish the global connections among remote points, allowing us to detect shapes and intrinsic manifold structures buried in high dimensional data. Furthermore, the CTD is simply the Euclidean distance in the space spanned by the eigenvectors of the graph Laplacian. This implies that clusters can be easily detected using a classical clustering algorithm (e.g., k-means) when data is represented in this space. This thesis aims to scrutinize the statistical principles of this nonparametric clustering method, which is robust and manages to capture both local and global geometrical structures in the data. Properties of the CTD and its relations with the clustering structures of the data are investigated extensively.