Fanny Bergström: When graph theory meets unsupervised learning: the statistical properties of ‘spectral clustering’
Masters Thesis
Time: Thu 2020-06-11 09.45
Location: Zoom, meeting ID: 901 405 086
Participating: Fanny Bergström
Subject area: Mathematical Statistics
Supervisor: Chun-Biu Li
Abstract
Spectral clustering treats clustering as a graph partitioning problem, where clusters are constructed based on the commute time distance (CTD) between the nodes of the graph. The CTD combines the local connections between nearby points to establish the global connections among remote points, allowing us to detect shapes and intrinsic manifold structures buried in high dimensional data. Furthermore, the CTD is simply the Euclidean distance in the space spanned by the eigenvectors of the graph Laplacian. This implies that clusters can be easily detected using a classical clustering algorithm (e.g., k-means) when data is represented in this space. This thesis aims to scrutinize the statistical principles of this nonparametric clustering method, which is robust and manages to capture both local and global geometrical structures in the data. Properties of the CTD and its relations with the clustering structures of the data are investigated extensively.