Tobias Wängberg: From Cubism to Realism in Data Visualization: Graph Based Stochastic Neighbour Embedding for Unveiling Hierarchical and Nonlinear Structures
Time: Wed 2021-05-19 14.00 - 14.45
Location: Zoom, meeting ID: 611 3329 7865
Lecturer: Tobias Wängberg
The t-distributed Stochastic Neighbour Embedding (t-SNE) has emerged as one of the leading embedding methods for visualising High Dimensional (HD) data in a wide variety of fields, such as immune profiling of COVID-19 patients, revealing cluster structures in HD images and single cell transcriptomics data, etc. However, several shortcomings of the t-SNE have been identified. Specifically, t-SNE often fails to correctly represent hierarchical relationships between clusters and spurious patterns may arise in the visualisations due to incorrect hyper-parameter settings, which could result in a distorted `cubistic' embedding of the underlying data structure.
In this talk I will begin by presenting the intuitive ideas behind the t-SNE method followed by a survey of its limitations. To provide a much precise 'realistic' embedding of the data, we proposed combining t-SNE with shape-aware graph distances to mitigate the shortcomings mentioned above. In terms of quantitative validation methods, I will use simulated examples to show the significant improvements by the graph based t-SNE in visualizing imbalanced and non-linear clusters, as well as preservation of hierarchical structures. Moreover, we propose a data-driven hyper-parameter setting, different from previously suggested ones, which we find consistently optimal across all the test cases examined. Lastly, I will demonstrate the superior performance of the graph based t-SNE in the visualisations of the real data sets of the MNIST images as well as the single cell transcriptomics gene expression data.