Skip to main content
To KTH's start page To KTH's start page

Busra Tas: A Data-driven Fuzzy Clustering Based on Density Peaks Using Graph Distance

Time: Wed 2022-12-14 13.30

Location: Albano, Cramer room

Participating: Busra Tas

Export to calendar


Clustering is an unsupervised learning method that discovers discrete data structures in the commonly high-dimensional feature space. One main approach in cluster analysis is the density-based clustering which detects clusters in terms of density variations in the feature space. A well-known and intuitive density-based method proposed by Rodriguez and Laio [1] is the Density Peak Clustering (DPC), which does not require the input of cluster number and can “partially” discover clusters with arbitrary shapes and imbalance. However, its validation using the decision graph can be very subjective. Moreover, it is a hard clustering method that cannot handle clusters with strong overlapping.
In this study, we generalize DPC to overcome these limitations. First, we employ graph distance to correctly account for the nonlinear cluster shapes that significantly improves the detection power of clusters with arbitrary shapes. Second, we automatize the identification of cluster-center and the number of clusters with a well-known validation index. Third, we introduce a fuzzy extension to cluster detection that allows us to assign objectively a membership probability of a data point associated with a particular cluster. Finally, in contrast to other state-of-the-art clustering methods, such as DPC and DBSCAN [2], our method involves a minimal number of hyper-parameters that need to be fixed by the user subjectively. In this talk, I will explain these generalizations and the performance of the new method in terms of test cases and real datasets.

[1] Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344, 1492-1496.
[2] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226-231.