Skip to main content
To KTH's start page To KTH's start page

Data Clustering Based on Random Space Partitioning

Docent lecture by Atsuto Maki

Time: Mon 2014-04-07 10.00

Location: Room 304, Teknikringen 14

Participating: Docent lecture by Atsuto Maki

Export to calendar

Clustering a large-scale data set is a challenging problem. In this lecture, we will introduce an attempt to solve such a problem, in particular, addressing the question of how to deal with data that are distributed in complex manifolds. Using a decision forest, we first generate multiple partitions of the same input space, one per tree. The partitions from all trees are merged by intersecting them, resulting in a partition of higher resolution. We construct a graph by assigning a node to each region and linking adjacent nodes, which turns the clustering problem in the feature space into a graph clustering task. We then solve it with the Markov cluster algorithm. The presented algorithm is able to capture non-convex structure while being computationally efficient, capable of dealing with large data sets. We show the performance on synthetic data as well as on applications such as image segmentation. The resulting partition also allows us to determine the cluster index for new query data.