# Workshops and conferences

## Workshop on Mathematics for Complex Data, June 24-26, 2019

The purpose of this workshop is to bring together researchers interested in the mathematics of complex data. There will be talks on mathematical theory and methods related to data analysis and artificial intelligence.

### Speakers

Wojciech Chacholski, KTH

Henrik Hult, KTH

Johan Jonasson, Chalmers

Annika Lang, Chalmers

Joel Larsson, Warwick

Konstantin Mischaikow, Rutgers

Anna Persson, KTH

Daniel Persson, Chalmers

Thomas Schön, Uppsala

Martina Scolamiero, KTH

Natasa Sladoje, Uppsala

Liam Solus, KTH

Tatyana Turova, Lund

Guo-Jhen Wu, Brown

### Registration

Please register here for the workshop (registration is free of charge).

### Schedule

*Monday, June 24*

10.30-11.00 Registration

11.00-11.10 Welcome

11.10-11.55 Daniel Persson, Chalmers

11.55-14.00 Lunch

14.00-14.45 Thomas Schön, Uppsala

14.45-15.30 Natasa Sladoje, Uppsala

15.30-16.00 Coffee

16.00-16.45 Henrik Hult, KTH

*Tuesday, June 25*

9.15-10.00 Guo-Jhen Wu, Brown

10.00-10.15 Coffee

10.15-11.00 Konstantin Mischaikow, Rutgers

11.00-11.45 Wojciech Chacholski, KTH

11.45-14.00 Lunch

14.00-14.45 Martina Scolamiero, KTH

14.45-15.30 Annika Lang, Chalmers

15.30-16.00 Coffee

16.00-16.45 Anna Persson, KTH

Wednesday, June 26

8.30-9.15 Liam Solus, KTH

9.15-10.00 Tatyana Turova, Lund

10.00-10.15 Coffee

10.15-11.00 Joel Larsson, Warwick

11.00-11.45 Johan Jonasson, Chalmers

### Location

Room U31 , Brinellvägen 28A, KTH Campus, Valhallavägen, Stockholm.

### Talks

**Daniel Persson, Chalmers**

**Title:** Quantum deep learning

**Abstract:** Despite the overwhelming success of deep neural networks we are still at a loss for explaining exactly how deep learning works, and why it works as well as it does. What are the fundamental principles underlying deep learning? In recent years there have been various intriguing proposals for how results from physics and mathematics may shed light on these principles. In this talk I will describe some of these exciting developments, including topics such as renormalization, quantum entanglement and gauge equivariant convolutional neural networks.

**Thomas Schön, Uppsala**

**Title:** Probabilistic modelling - driven by data, guided by physics

**Abstract:** In this talk we want to show that the combined use of data-driven modelling and existing scientific knowledge can be quite rewarding. We illustrate this using concrete examples from physics, including modelling the ambient magnetic field, neutron diffraction experiments aiming to reconstruct the strain field, and computed tomographic (CT) reconstruction. These are all concrete examples where physics provide us with linear operator constraints that needs to be fulfilled (first example) or alternatively measurements constituted by line integrals (two last). The Gaussian process is one of the most widely used models within machine learning, even though it has to some extent been overshadowed by the deep neural networks in recent years. The reason for the usefulness of the Gaussian process is that it offers a probabilistic and non-parametric model of nonlinear functions. When these properties are combined with basic existing scientific knowledge of the phenomenon under study we have a useful mathematical tool capable of fusing existing knowledge with new measured data. Besides briefly introducing the Gaussian process we will also provide some insights into how we can adapt it so that it obeys linear operator constraints (including ODEs, PDEs and integrals), motivated for example by the specific examples above. Towards the end we will also (very briefly) sketch how the Gaussian process can incorporate deep neural networks to further enhance its flexibility. These developments opens up for the use of basic scientific knowledge within one of our classic machine learning models.

**Natasa Sladoje, Uppsala**

**Title:** Distance functions for robust image registration: Theory and applications

**Abstract: **Analysis of visual data - digital images - provides numerous challenges. At the same time, the amount of such data to be analysed nowadays increases rapidly and overwhelmingly in a range of application fields. This talk will focus on a few fundamental tasks in image processing which require comparison of images, our recently proposed solutions which rely on robust and efficient image distance functions, underlying mathematical tools, as well as some successful applications, primarily within biomedicine.

**Henrik Hult, KTH**

**Title:** Latent variable generative models

**Abstract:** Latent variable models use hidden variables to describe unobserved abstractions of data. I will discuss a variety of latent variable models and their applications in population genetics and machine learning. Emphasis will be on asymptotic properties related to sampling, interpolation, and goodness-of-fit.

**Guo-Jhen Wu, Brown**

**Title:** Optimal temperature selection for infinite swapping in the low temperature limit

**Abstract:** Parallel tempering, also known as replica exchange, is an algorithm used to speed up the convergence of slowly converging Markov processes (corresponding to lower temperatures for models from the physical sciences). By constructing other processes with higher temperature and allowing Metropolis type swaps between the different processes, the original process is able to explore the state space more efficiently via the swapping mechanism. It has been proven that by sending the swap rate to infinity, the sampling properties reach optimality in a certain sense. Moreover, this "infinite swapping limit" is attained by process with symmetrized dynamics, which when combined with a weighted empirical measure provide approximations to the original problem. After discussing the construction, we focus on optimizing variance with respect to selection of the temperatures. As will be discussed, there are two main contributions of variance reduction. The first one comes from a lowering of energy barriers and consequent improved communication properties. The second and less obvious source is because of the weights appearing in the weighted empirical measure. These two variance reduction mechanisms behave in opposite ways as the temperatures vary. Based on an extension of Freidlin-Wentzell theory, we are able to identify the best temperature sequence for certain models when the lowest temperature is sent to zero, i.e., when sampling is most difficult.

**Konstantin Mischaikow, Rutgers**

**Title:** Conley Theory, Dynamics, and Data

**Abstract:** In this talk I will provide a brief review of Conley theory from the perspective of it being a combinatorial and topological extension of Morse theory. The focus of the lecture will be on how Conley theory can be used to derive conclusions about the structure of continuous nonlinear dynamical systems from finite data.

**Wojciech Chacholski, KTH**

**Title:** What is persistence?

**Abstract:** How to give a machine a sense of geometry? There are two aspects of what a sense is: technical tool and ability to learn to use it. This learning ability is essential. For example we are born with technical ability to detect smells and through our lives we develop it, depending on needs and environment around us. In my talk I will describe how to use homology to give a machine a sense of geometry.

**Martina Scolamiero, KTH**

**Title:** Multi-parameter persistence and noise

**Abstract:** Topology has recently been used to study spaces arising from data, leading to important results in several fields such as neuroscience, cancer research, and material science. By using a method called multi-parameter persistence one can define new topological signatures that represent correlation patterns among different distances on a dataset. In this talk, I will describe our strategy to define such signatures for data and show that they are robust to noise and amenable to statistics. I will then focus on the computational challenges associated with the multi-parameter setting and outline directions for further research.

**Annika Lang, Chalmers**

**Title**: Deep learning and stochastic partial differential equations: a possible connection

**Abstract:** Neural networks are getting deeper and deeper. Letting the number of layers going to infinity, we can interpret them as discretizations of a time continuous problem. In this talk, I will discuss this idea and give a connection to the simulation of stochastic partial differential equations.

**Anna Persson, KTH**

**Title:** A multiscale method for Maxwell's equations

**Abstract:** In this talk I want to introduce you to so called multiscale problems and the numerical challenges we face when modeling them. Problems of multiscale type typically appear when simulating physical behavior in heterogeneous media, such as composite materials and porous structures. This results in partial differential equations with highly varying and oscillating coefficients. It is well known that classical polynomial based finite element methods fail to approximate the solution to such equations well, unless the mesh width is sufficiently small to resolve the data. This leads to issues with computational cost and available memory, which calls for new approaches and methods. In the first part of this talk I will present a multiscale method based on localized orthogonal decomposition, which uses information from the data to enrich the finite element space. In the second part of the talk I will explain how to apply this method to Maxwell's equations. These equations are of interest when simulating electromagnetic wave propagation in heterogeneous materials, such as photonic crystals and other metamaterials.

**Liam Solus, KTH**

**Title:** Discrete Geometry in Model Discovery

**Abstract:** In the today's world, where data is an abundant resource, there is a strong interest in data-driven algorithms for model discovery that are both efficient and reliable. Of particular interest are such algorithms for learning probabilistic, or even causal, DAG models. Historically, the combinatorics of graphs has played a central role in the development of DAG model learning algorithms. However, ideas from contemporary geometric and algebraic combinatorics were previously uninvolved. In this talk, we will discuss a new method for probabilistic DAG model discovery. This method arises as a simplex-type algorithm over convex polytopes known as generalized permutohedra, which are central to the field of algebraic and geometric combinatorics. We will see that, when compared with the state-of-the-art, these methods perform competitively, are provably more reliable, and even shed some restricting parametric assumptions. We will then discuss how these methods extend to give the first ever consistent greedy algorithms for learning causal DAG models, and examine their performance on real data coming from genomics and computational biology.

**Tatyana Turova, Lund**

**Title:** Random graph models in Neuroscience

**Abstract:** Nearly all contemporary analyses of empirical data on the neuronal networks appeal to the graph theory for its very relevant terminology and methodology. Neuroscience gave in turn rise to new

models in statistical physics, helping to understand better the issue of phase transitions in large networks.

We provide an overview of the graph theory results relevant to the analysis and modelling of neuronal networks and their functioning. To begin with the structure we discuss models of growing random graphs and closely related models of geometric random graphs. To address the issue of functioning of neuronal networks, we consider bootstrap percolation processes. In particular, we show that the threshold of connectivity for an auto-associative memory in a Hopfield model on a random graph

coincides with the threshold for the bootstrap percolation on the same random graph. We shall argue that this coincidence reflects the relations between the auto-associative memory mechanism and the properties of the underlying random network structure.

**Joel Larsson, Warwick**

**Title:** Biased and polarized random k-SAT

**Abstract:** Random SAT problems were first studied as a means to understand the average-case complexity of finding solutions to a boolean formula in n variables, but has since then become one of the most intensively studied intersections of combinatorics, computer science, and physics. Such formulae typically exhibit a threshold phenomenon: as more (random) constraints are added, a formula rapidly switches from being satisfiable to unsatisfiable. This switch occurs at some critical density of constraints per variable, and much work has been devoted to nailing down such critical densities for various models. In the classic random k-SAT model there is a balance between TRUE and FALSE, in that each variable is equally likely to be assigned either value. This is not necessarily true of real-world SAT instances, and we study the critical densities in two models where this balance is broken: biased random k-SAT and polarized random k-SAT. Joint work with Klas Markström

**Johan Jonasson, Chalmers**

**Title:** Mislabeling in statistical learning

**Abstract:** Mislabeling is a notorious problem in statistical learning and can have a great impact on the generalization performance of neural network classifiers. We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the network and an optimal regularized reweighting of training observations. We show that there is a synergy between these two penalties such that our combined penalty outperforms the standard regularized neural network. Moreover, we show that the combined penalty improves the generalization classification performance even in the absence of mislabeling. We provide a mathematical proof to explain the observed synergy between the two regularizations, derived for a simple two-layer network; the combined penalty method strictly outperforms a standard regularized neural net, both in terms of finding a true structure in data space and as avoiding to erroneously identify a structure where there is none.

We demonstrate our double regularized neural network, DRNet, on synthetic data of varying levels of difficulty. We also illustrate DRNet performance on the FMNIST data set with simulated mislabeling. These demonstrations do not only show that there indeed is a synergy between the two penalties but also that the relationship is dependent on the level of contamination on the training labels. We observe that observation reweighting through penalization provides classification models that are more robust against overfitting as well as less sensitive with respect to the selection of hyper-parameters. This last finding provides strong support for DRNet as a practical off-the-shelf classifier since costly hyper-parameter tuning can be much reduced.

### Past events

## Workshop on Mathematics for Complex Data, May 30-31, 2018

The purpose of this workshop is to bring together researchers interested in the mathematics of complex data. There will be talks on mathematical methods for data analysis as well as presentations of complex data in applications.

## Location

The Brummer & Partners MathDataLab: Workshop on Mathematics for Complex Data takes place in room K1, Teknikringen 56, floor 3, KTH Campus

## Registration

Please register here for the workshop (registration is free of charge).

## Program

**May 30th**

13.00 - 13.45 Alexandre Proutiere, KTH, Automatic Control

13.45 - 14.30 Atsuto Maki, KTH, RPL

14.30 - 15.00 Coffee Break

15.00 - 15.45 Salla Franzen, SEB

16.00 -17.00 Reception

**May 31st**

09.00 - 09.45 Anna Persson, Chalmers

09.45 - 10.30 Jens Berg, Uppsala

10.30 - 11.00 Coffee Break

11.00 - 11.45 Martina Scolamiero, EPFL

11.45 - 12.30 David Eklund, KTH Mathematics

## Presentations

**Alexandre Proutiere**, KTH, Automatic Control

*Title*: Inference from graphical data: fundamental limits and optimal algorithms

*Abstract: *We investigate the problem of cluster recovery in random graphs generated according to models extending the celebrated Stochastic Block Model. To reconstruct the clusters, we sequentially sample the edges of the graph, either randomly or in an adaptive manner or following a random walk on the graph. With a given sample budget, the objective is to devise a clustering algorithm that recover the hidden clusters with the highest possible accuracy. We develop a generic method to derive tight upper bound for the reconstruction accuracy (satisfied by any algorithm), and inspired by this fundamental limit, devise asymptotically optimal clustering algorithms. We further study the design of clustering algorithms with limited memory and computational complexity

**Atsuto Maki**, KTH, RPL

*Title: *Transfer learning and multi-task learning in deep convolutional networks

*Abstract: *Deep Convolutional Networks (ConvNets) have become prevalent in computer vision in the last several years. The talk is about Transfer Learning and Multi-Task Learning in ConvNets which we have been studying at Robotics, Perception, and Learning (RPL) Lab. We will first look at the utility of global image descriptors given by ConvNets for visual recognition tasks in the context of transfer learning. Then we will turn to Multi-Task Learning for the tasks of semantic segmentation and object detection which would in general involve some challenges in designing a global objective function. Time permitted, we will also visit the topic of robot learning with the new Deep Predictive Policy Training using Reinforcement Learning.

**Salla Franzen**, SEB

*Title: *Data-driven banking

*Abstract: *The amount of data available for harvesting and exploring increases every second. In finance the time has come to leverage the knowledge, expertise and experience from financial experts and combine them with the new technologies and open software programming methodologies available. This talk will focus on some examples of applications of these new technologies and methodologies to big data sets in finance and on the enormous potential of collaborations between academics and industry experts.

**Anna Persson,** Chalmers

*Title: *A multiscale method for parabolic equations

*Abstract: *We study numerical solutions for parabolic equations with highly varying (multiscale) coefficients. Such equations typically appear when modelling heat diffusion in heterogeneous media like composite materials. For these problems classical polynomial based finite element methods fail to approximate the solution well unless the mesh width resolves the variations in the data. This leads to issues with computational cost and available memory, which calls for new approaches and methods. In this talk I will present a multiscale method based on localized orthogonal decomposition, first introduced by Målqvist and Peterseim (2014). The focus will be on how to generalize this method to time dependent problems of parabolic type.

**Jens Berg**, Uppsala, Mathematics

*Title:* Data-driven discovery of partial differential equations

*Abstract:* The current era is providing us with an abundance of high-quality data. A long-standing problem in the natural sciences is how to to transform the observed data into a predictive mathematical model. In this talk we will use the recent advances in machine learning and deep learning to analyze complex data sets and discover their governing partial differential equations (PDEs). The method will be demonstrated for data sets which have been generated by known PDEs, and finally we will discuss some applications where traditional modeling by first physical principles is intractable.

**Martina Scolamiero,** EPFL

*Title: *Multivariate Methods in Topological Data Analysis

*Abstract:* In Topological Data Analysis we study the shape of data using topology. Similarly to clustering, shape characteristics highlight correlation patterns and describe the structure of the data, that can then be exploited for prediction tasks. Multivariate methods in TDA are especially interesting as they can be used to combine and study heterogeneous sources of information about a dataset. In this talk I will focus on multi-parameter persistence, a rich and challenging multivariate method. In particular, I will describe a framework that allows to compute a new class of stable invariants for multi-parameter persistence. The key element underlying this novel approach is a metric de ned by `noise systems'. A lter function is usually chosen to highlight properties we want to examine in a dataset. Similarly, our new metric allows some features of datasets to be considered as noise. Examples of topological analysis on real world data will be presented throughout the talk, with a speci c focus on applications to neuroscience and psychiatry.

**David Eklund**, KTH, Mathematics

*Title: *The algebraic geometry of bottlenecks

*Abstract: *I will talk about bottlenecks of algebraic varieties in complex affine space. The bottlenecks are lines which are normal to the variety at two distinct points. Such pairs of points, and the distance between them, is of major importance in the data analysis of real varieties. I will explain the relation to the so-called reach of a smooth variety which appears naturally in the context of topological data analysis. I will address two interlinked problems: the enumerative problem of counting the number of bottlenecks and the computational problem of formulating efficient numerical methods to compute bottlenecks.

## Opening workshop Nov 17, 2017

The opening of the Brummer & Partners MathDataLab takes place on Nov 17.

## Location

The opening workshop takes place at: Open Lab, Valhallavägen 79, Stockholm ( )

## Registration

Please register for the workshop by Nov 10. (registration is free of charge but space is limited)

## Program

**08:45 Breakfast**

**09:15 Welcome,** *Sigbritt Karlsson*, KTH president

**09:20 Presentation of Brummer & Partners MathDataLab,** *Henrik Hult*, KTH

**09:30-10:15** **Randomized algorithms for large scale linear algebra and data analytics,*** Per-Gunnar Martinsson*, Oxford University

*Abstract. *The talk will describe how randomized projections can be used to effectively, accurately, and reliably solve important problems that arise in data analytics and large scale linear algebra. We will focus in particular on accelerated algorithms for computing full or partial matrix factorizations such as the eigenvalue decomposition, the QR factorization, etc. Randomized projections are used in this context to reduce the effective dimensionality of intermediate steps in the computation. The resulting algorithms execute faster on modern hardware than traditional algorithms, and are particularly well suited for processing very large data sets.

The algorithms described are supported by a rigorous mathematical analysis that exploits recent work in random matrix theory. The talk will briefly review some representative theoretical results.

**10:15-11:00 From RNA-seq time series data to models of regulatory networks,** *Konstantin Mischaikow*, Rutgers University

*Abstract.* We will describe a novel approach to nonlinear dynamics based on topological and combinatorial ideas. An important consequence of this approach is that it is both computationally accessible and allows us to rigorously describe dynamics observable at a fixed scale over large sets of parameter values. To demonstrate the value of this approach we will consider RNA-seq time series data time series data and propose potential regulatory networks based on how robustly the network is capable of reproducing the observed dynamics.

**11:00-12:30 Lunch break **(lunch not included)

**12:30-13:15 What is persistence?** *Wojciech Chacholski,* KTH

*Abstract. * What does it mean to understand shape? How can we measure it and make statistical conclusions about it? Do data sets have shapes and if so how to use their shape to extract information about the data? There are many possible answers to these questions. Topological data analysis (TDA) aims at providing some of them using homology. In my presentation aimed at broader audience I will describe the essence of TDA. I will illustrate how TDA can be used to give a machine intelligence to learn geometric shapes and how this ability can be used in data analysis.

**13:15-14:00 Some mathematical challenges in the analysis of complex data,** *Henrik Hult,* KTH

*Abstract.* In this talk I will give an overview of some recent advancement in the analysis of complex data. The talk will emphasize questions related to training and architecture of neural networks and I will try to highlight some mathematical challenges in this field.