# Brummer & Partners MathDataLab

Welcome to the home page of the Brummer & Partners MathDataLab. Here you can find information about the Lab and its activities.

## Upcoming events

### Seminar with Richard Davis, Columbia University, Oct 14 (joint with Mathematical Statistics)

Room F11, Monday Oct 14, 15.15-16.15

**Title: **The Use of Shape Constraints for Modeling Time Series of Counts

Abstract: For many formulations of models for time series of counts, the specification of a family of probability mass functions relating the observation Y_{t} at time t to a state variable X_{t} must be explicitly specified. Typical choices are the Poisson and negative binomial distributions. One of the principal goals of this research is to relax this parametric framework and assume that the requisite pmf is a one-parameter exponential family in which the reference distribution is unknown but log-concave. This class of distributions includes many of the commonly used pmfs. The serial dependence in the model is governed by specifying the evolution of the conditional mean process. The particular link function used in the exponential family model depends on the specification of the reference distribution. Using this semi-parametric model formulation, we are able to extend the class of observation-driven models studied in Davis and Liu (2016). In particular, we show there exists a stationary and ergodic solution to the state-space model. In this new semi-parametric framework, we compute and maximize the likelihood function over both the parameters associated with the mean function and the reference measure subject to a concavity constraint. On top of this we can “smooth” the pmf using the Skellam distribution in order to obtain an estimated distribution defined on all the non-negative integers. In general, the smooth version has better performance than existing methods. The estimator of the mean function and the conditional distribution are shown to be consistent and perform well compared to a full parametric model specification. Further limit theory in other situations will be described. The finite sample behavior of the estimators are studied via simulation and empirical examples are provided to illustrate the methodology. This is joint work with Jing Zhang and Thibault Vatter.

### Study group on Machine Learning and Numerical Analysis

With this study group we wish to explore some topics in machine learning with focus on numerical analysis. The aim is to increase knowledge and familiarity with various machine learning tools. There will be three seminars during the autumn:

- October 4, 14:15-16:00, Room F11. Anders Szepessy, KTH: "Convergence for learning neural networks by the stochastic gradient method".
- December 11, 13.15-15, Ozan Öktem, KTH: Title: TBA
- Richard Tsai University of Texas at Austin: TBA

Welcome!

Organizers: Anna Persson & Patrick Henning

## Past events

### Seminar with Sid Resnick, Cornell University, Sep 16 (joint with Mathematical Statistics)

Room F11, Monday Sep 16, 15.15-16.15

**Title: **Exploring Dependence in Multivariate Heavy Tailed Data

Abstract: We review a framework for considering multivariate data that could plausibly come from a multivariate power law. The framework is flexible enough to allow for multiple (even infinite) heavy tail regimes depending on the choice of scaling and the definition of an extreme region. We use the framework to explore extremal dependence between components using both graphical and analytical means. An example using market returns is given for what we call "strong dependence" and an exploratory graph technique can highlight the most dependent components.

### Seminar with Paul Jenkins, University of Warwick, Sep 12

Room F11, Thursday Sep 12, 15.15-16.15

**Title: **Asymptotic genealogies of interacting particle systems

Abstract: Interacting particle systems are a broad class of stochastic models for phenomena in disciplines including physics, engineering, biology, and finance. A prominent example in statistics is the particle filter, which features prominently in numerical approximation schemes for observations from hidden Markov models. A particle filter proceeds by evolving a discrete-time, weighted population of particles whose empirical measure approximates the distribution of the hidden process. In this talk I discuss how to characterise the genealogy underlying this evolving particle system. More precisely, under certain conditions we can show that the genealogy converges (as the number of particles grows) to Kingman's coalescent, a stochastic tree-valued process widely studied in population genetics. This makes explicit the analogy between a particle filter and an evolving biological population.

## Workshop on Mathematics for Complex Data, June 24-26, 2019

The purpose of this workshop is to bring together researchers interested in the mathematics of complex data. There will be talks on mathematical theory and methods related to data analysis and artificial intelligence.

### Schedule

*Monday, June 24*

10.30-11.00 Registration

11.00-11.10 Welcome

11.10-11.55 Daniel Persson, Chalmers

11.55-14.00 Lunch

14.00-14.45 Thomas Schön, Uppsala

14.45-15.30 Natasa Sladoje, Uppsala

15.30-16.00 Coffee

16.00-16.45 Henrik Hult, KTH

*Tuesday, June 25*

9.15-10.00 Guo-Jhen Wu, Brown

10.00-10.15 Coffee

10.15-11.00 Konstantin Mischaikow, Rutgers

11.00-11.45 Wojciech Chacholski, KTH

11.45-14.00 Lunch

14.00-14.45 Martina Scolamiero, KTH

14.45-15.30 Annika Lang, Chalmers

15.30-16.00 Coffee

16.00-16.45 Anna Persson, KTH

Wednesday, June 26

8.30-9.15 Liam Solus, KTH

9.15-10.00 Tatyana Turova, Lund

10.00-10.15 Coffee

10.15-11.00 Joel Larsson, Warwick

11.00-11.45 Johan Jonasson, Chalmers

### Location

Room U31 , Brinellvägen 28A, KTH Campus, Valhallavägen, Stockholm.

### Talks

**Daniel Persson, Chalmers**

**Title:** Quantum deep learning

**Abstract:** Despite the overwhelming success of deep neural networks we are still at a loss for explaining exactly how deep learning works, and why it works as well as it does. What are the fundamental principles underlying deep learning? In recent years there have been various intriguing proposals for how results from physics and mathematics may shed light on these principles. In this talk I will describe some of these exciting developments, including topics such as renormalization, quantum entanglement and gauge equivariant convolutional neural networks.

**Thomas Schön, Uppsala**

**Title:** Probabilistic modelling - driven by data, guided by physics

**Abstract:** In this talk we want to show that the combined use of data-driven modelling and existing scientific knowledge can be quite rewarding. We illustrate this using concrete examples from physics, including modelling the ambient magnetic field, neutron diffraction experiments aiming to reconstruct the strain field, and computed tomographic (CT) reconstruction. These are all concrete examples where physics provide us with linear operator constraints that needs to be fulfilled (first example) or alternatively measurements constituted by line integrals (two last). The Gaussian process is one of the most widely used models within machine learning, even though it has to some extent been overshadowed by the deep neural networks in recent years. The reason for the usefulness of the Gaussian process is that it offers a probabilistic and non-parametric model of nonlinear functions. When these properties are combined with basic existing scientific knowledge of the phenomenon under study we have a useful mathematical tool capable of fusing existing knowledge with new measured data. Besides briefly introducing the Gaussian process we will also provide some insights into how we can adapt it so that it obeys linear operator constraints (including ODEs, PDEs and integrals), motivated for example by the specific examples above. Towards the end we will also (very briefly) sketch how the Gaussian process can incorporate deep neural networks to further enhance its flexibility. These developments opens up for the use of basic scientific knowledge within one of our classic machine learning models.

**Natasa Sladoje, Uppsala**

**Title:** Distance functions for robust image registration: Theory and applications

**Abstract: **Analysis of visual data - digital images - provides numerous challenges. At the same time, the amount of such data to be analysed nowadays increases rapidly and overwhelmingly in a range of application fields. This talk will focus on a few fundamental tasks in image processing which require comparison of images, our recently proposed solutions which rely on robust and efficient image distance functions, underlying mathematical tools, as well as some successful applications, primarily within biomedicine.

**Henrik Hult, KTH**

**Title:** Latent variable generative models

**Abstract:** Latent variable models use hidden variables to describe unobserved abstractions of data. I will discuss a variety of latent variable models and their applications in population genetics and machine learning. Emphasis will be on asymptotic properties related to sampling, interpolation, and goodness-of-fit.

**Guo-Jhen Wu, Brown**

**Title:** Optimal temperature selection for infinite swapping in the low temperature limit

**Abstract:** Parallel tempering, also known as replica exchange, is an algorithm used to speed up the convergence of slowly converging Markov processes (corresponding to lower temperatures for models from the physical sciences). By constructing other processes with higher temperature and allowing Metropolis type swaps between the different processes, the original process is able to explore the state space more efficiently via the swapping mechanism. It has been proven that by sending the swap rate to infinity, the sampling properties reach optimality in a certain sense. Moreover, this "infinite swapping limit" is attained by process with symmetrized dynamics, which when combined with a weighted empirical measure provide approximations to the original problem. After discussing the construction, we focus on optimizing variance with respect to selection of the temperatures. As will be discussed, there are two main contributions of variance reduction. The first one comes from a lowering of energy barriers and consequent improved communication properties. The second and less obvious source is because of the weights appearing in the weighted empirical measure. These two variance reduction mechanisms behave in opposite ways as the temperatures vary. Based on an extension of Freidlin-Wentzell theory, we are able to identify the best temperature sequence for certain models when the lowest temperature is sent to zero, i.e., when sampling is most difficult.

**Konstantin Mischaikow, Rutgers**

**Title:** Conley Theory, Dynamics, and Data

**Abstract:** In this talk I will provide a brief review of Conley theory from the perspective of it being a combinatorial and topological extension of Morse theory. The focus of the lecture will be on how Conley theory can be used to derive conclusions about the structure of continuous nonlinear dynamical systems from finite data.

**Wojciech Chacholski, KTH**

**Title:** What is persistence?

**Abstract:** How to give a machine a sense of geometry? There are two aspects of what a sense is: technical tool and ability to learn to use it. This learning ability is essential. For example we are born with technical ability to detect smells and through our lives we develop it, depending on needs and environment around us. In my talk I will describe how to use homology to give a machine a sense of geometry.

**Martina Scolamiero, KTH**

**Title:** Multi-parameter persistence and noise

**Abstract:** Topology has recently been used to study spaces arising from data, leading to important results in several fields such as neuroscience, cancer research, and material science. By using a method called multi-parameter persistence one can define new topological signatures that represent correlation patterns among different distances on a dataset. In this talk, I will describe our strategy to define such signatures for data and show that they are robust to noise and amenable to statistics. I will then focus on the computational challenges associated with the multi-parameter setting and outline directions for further research.

**Annika Lang, Chalmers**

**Title**: Deep learning and stochastic partial differential equations: a possible connection

**Abstract:** Neural networks are getting deeper and deeper. Letting the number of layers going to infinity, we can interpret them as discretizations of a time continuous problem. In this talk, I will discuss this idea and give a connection to the simulation of stochastic partial differential equations.

**Anna Persson, KTH**

**Title:** A multiscale method for Maxwell's equations

**Abstract:** In this talk I want to introduce you to so called multiscale problems and the numerical challenges we face when modeling them. Problems of multiscale type typically appear when simulating physical behavior in heterogeneous media, such as composite materials and porous structures. This results in partial differential equations with highly varying and oscillating coefficients. It is well known that classical polynomial based finite element methods fail to approximate the solution to such equations well, unless the mesh width is sufficiently small to resolve the data. This leads to issues with computational cost and available memory, which calls for new approaches and methods. In the first part of this talk I will present a multiscale method based on localized orthogonal decomposition, which uses information from the data to enrich the finite element space. In the second part of the talk I will explain how to apply this method to Maxwell's equations. These equations are of interest when simulating electromagnetic wave propagation in heterogeneous materials, such as photonic crystals and other metamaterials.

**Liam Solus, KTH**

**Title:** Discrete Geometry in Model Discovery

**Abstract:** In the today's world, where data is an abundant resource, there is a strong interest in data-driven algorithms for model discovery that are both efficient and reliable. Of particular interest are such algorithms for learning probabilistic, or even causal, DAG models. Historically, the combinatorics of graphs has played a central role in the development of DAG model learning algorithms. However, ideas from contemporary geometric and algebraic combinatorics were previously uninvolved. In this talk, we will discuss a new method for probabilistic DAG model discovery. This method arises as a simplex-type algorithm over convex polytopes known as generalized permutohedra, which are central to the field of algebraic and geometric combinatorics. We will see that, when compared with the state-of-the-art, these methods perform competitively, are provably more reliable, and even shed some restricting parametric assumptions. We will then discuss how these methods extend to give the first ever consistent greedy algorithms for learning causal DAG models, and examine their performance on real data coming from genomics and computational biology.

**Tatyana Turova, Lund**

**Title:** Random graph models in Neuroscience

**Abstract:** Nearly all contemporary analyses of empirical data on the neuronal networks appeal to the graph theory for its very relevant terminology and methodology. Neuroscience gave in turn rise to new

models in statistical physics, helping to understand better the issue of phase transitions in large networks.

We provide an overview of the graph theory results relevant to the analysis and modelling of neuronal networks and their functioning. To begin with the structure we discuss models of growing random graphs and closely related models of geometric random graphs. To address the issue of functioning of neuronal networks, we consider bootstrap percolation processes. In particular, we show that the threshold of connectivity for an auto-associative memory in a Hopfield model on a random graph

coincides with the threshold for the bootstrap percolation on the same random graph. We shall argue that this coincidence reflects the relations between the auto-associative memory mechanism and the properties of the underlying random network structure.

**Joel Larsson, Warwick**

**Title:** Biased and polarized random k-SAT

**Abstract:** Random SAT problems were first studied as a means to understand the average-case complexity of finding solutions to a boolean formula in n variables, but has since then become one of the most intensively studied intersections of combinatorics, computer science, and physics. Such formulae typically exhibit a threshold phenomenon: as more (random) constraints are added, a formula rapidly switches from being satisfiable to unsatisfiable. This switch occurs at some critical density of constraints per variable, and much work has been devoted to nailing down such critical densities for various models. In the classic random k-SAT model there is a balance between TRUE and FALSE, in that each variable is equally likely to be assigned either value. This is not necessarily true of real-world SAT instances, and we study the critical densities in two models where this balance is broken: biased random k-SAT and polarized random k-SAT. Joint work with Klas Markström

**Johan Jonasson, Chalmers**

**Title:** Mislabeling in statistical learning

**Abstract:** Mislabeling is a notorious problem in statistical learning and can have a great impact on the generalization performance of neural network classifiers. We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the network and an optimal regularized reweighting of training observations. We show that there is a synergy between these two penalties such that our combined penalty outperforms the standard regularized neural network. Moreover, we show that the combined penalty improves the generalization classification performance even in the absence of mislabeling. We provide a mathematical proof to explain the observed synergy between the two regularizations, derived for a simple two-layer network; the combined penalty method strictly outperforms a standard regularized neural net, both in terms of finding a true structure in data space and as avoiding to erroneously identify a structure where there is none.

We demonstrate our double regularized neural network, DRNet, on synthetic data of varying levels of difficulty. We also illustrate DRNet performance on the FMNIST data set with simulated mislabeling. These demonstrations do not only show that there indeed is a synergy between the two penalties but also that the relationship is dependent on the level of contamination on the training labels. We observe that observation reweighting through penalization provides classification models that are more robust against overfitting as well as less sensitive with respect to the selection of hyper-parameters. This last finding provides strong support for DRNet as a practical off-the-shelf classifier since costly hyper-parameter tuning can be much reduced.

### Seminar with Michael Lesnick, SUNY, May 28

Room 3418, Tuesday May 28, 11.15-12.15

**Title: **Interleavings and Multi-parameter Persistent Homology

Abstract: In topological data analysis (TDA), we associate to data a diagram of topological spaces, which we then study using algebraic topology. Topologists have been studying diagrams of topological spaces for decades; mathematically, what sets TDA apart from classical work is that in TDA, we are interested primarily in "approximate relations" between diagrams of spaces and their invariants, rather than in exact relations. For example, we are typically more interested in whether two diagrams of spaces are close to one another in some suitably chosen metric than whether they are isomorphic (or weakly equivalent) on the nose. Much of my recent work has focused on *interleavings*, which have emerged as the formal language of choice in TDA for expressing such approximate relations. I've been especially interested in interleavings in the setting of multi-parameter persistent homology, where they can be used to formulate multi-parameter versions of fundamental stability and inference results in TDA. In this talk, I'll introduce interleavings and multi-parameter persistent homology, and discuss some recent results about these.

### Seminar with Stephan Zhechev, IST Austria, May 22 (joint with combinatorics seminar)

Room 3418, Wednesday May 22, 11.15-12.15

**Title: **Embeddability is undecidable outside the meta-stable range

Abstract: We will prove that the following question is algorithmically undecidable for k+3 < d < 3(k+1)/2 , and k>4, which covers essentially everything outside the meta-stable range: Given a finite simplicial complex K of dimension k and an embedding f : L -> R^d of a subcomplex L of K, can f be extended to an embedding F : K -> R^d of the whole complex? Here, we assume that the given embedding f of the subcomplex L is linear (on each simplex of L) whereas the desired embedding F of the whole complex is allowed to be piecewise-linear (i.e., linear on an arbitrarily fine subdivision of K); moreover F is not required to be part of the output.

More generally, we prove that the question of deciding embeddability, which is the special case of the question above when we set L to be empty, is also undecidable for k+3 < d < 3(k+1)/2 , and k>4.

The strategy of our proof is to construct a reduction from Hilbert’s tenth problem to both the embeddability and extension of embeddings problems. More specifically, for particular types of systems of quadratic Diophantine equations, we will show how to construct corresponding instances of the two problems we consider, so that an embedding or extension exists if and only if the original system of equations has an integer solution. This is a joint work with Uli Wagner and Marek Filakovsky.

### Study group on Topological Data Analysis and Machine Learning

The aim of this study group is to explore and discuss links between these three topics through recent research papers. We will present state of the art techniques and focus on the approach of the TDA group at KTH. This study group will take place in room 3418 at the Mathematics department with the following schedule:

- Monday 28 January 10.00-12.00, talk by Martina Scolamiero, KTH
- Monday 4 February 10.00-12.00, talk by Steve Oudot, INRIA
- Monday 11 February 10.00-12.00, talk by Wojciech Chacholski, KTH
- Monday 18 February 10.00-12.00, talk by Oliver Gävfert, KTH.

### Workshop on Deep Learning and Inverse Problems, Jan 21-25, 2019

DLIP2019 is a one week workshop for researchers and practitioners working on deep learning techniques for inverse problems. The objective is to enable open discussions on both practical and theoretical aspects, and give researchers time to discuss these problems in depth.

The workshop will feature some invited talks, but we hope that most attendants will also contribute with their own knowledge.

*Invited Speakers*: Ozan Öktem (KTH), Andreas Hauptmann (UCL), Sebastian Lunz (Cambridge).

See the workshop webpage for more information.

### Seminar with Josef Teichmann, ETH Zurich, Jan 18

Room F11, Friday Jan 18, 14.15-15.15

**Title: **Machine Learning in Finance

Abstract: We show three instances of machine learning in finance: deep hedging, deep calibration and deep simulation. The first two applications are direct application of universal approximation theorems, in contrast to deep simulation where Johnson-Lindenstrauss random projection are used to obtain expressive but tractable sets of trajectories.

### Seminar with Per-Gunnar Martinsson, University of Texas, Nov 26

Room F11, Monday Oct 26, 15.15-16.15

**Title: **Fast Direct Solvers for Elliptic PDEs

Abstract: That the linear systems arising upon the discretization of elliptic PDEs can be solved very efficiently is well-known, and many successful iterative solvers with linear complexity have been constructed (multigrid, Krylov methods, etc). Interestingly, it has recently been demonstrated that it is often possible to directly compute an approximate inverse to the coefficient matrix in linear (or close to linear) time. The talk will survey some recent work in the field and will argue that direct solvers have several advantages, including improved stability and robustness, and dramatic improvements in speed in certain environments. Moreover, the direct solvers being proposed have low communication costs, and are very well suited to parallel implementations.

### Seminar with Phyllis Wan, Rotterdam University, Oct 22

Room F11, Monday Oct 22, 15.15-16.15

**Title: **Modeling social networks through linear preferential attachment

Abstract: Preferential attachment is an appealing mechanism for modeling power-law behavior of degree distributions in social networks. In this talk, we consider fitting a directed linear preferential attachment model to network data under three data scenarios: 1) When the full history of the network growth is given, MLE of the parameter vector and its asymptotic properties are derived. 2) When only a single-time snapshot of the network is available, an estimation method combining method of moments with an approximation to the likelihood is proposed. 3) When the data are believed to have come from a misspecified model or have been corrupted, a semi-parametric approach to model heavy-tailed features of the degree distributions is presented, using ideas from extreme value theory. We illustrate these estimation procedures and explore the usage of this model through simulated and real data examples. This is a joint work with Tiandong Wang (Cornell), Richard Davis (Columbia) and Sid Resnick (Cornell).

### Seminar with Jonas Peters, University of Copenhagen, Oct 15

Room F11, Monday Oct 15, 15.15-16.15

**Title: Causality and data**

Abstract: Causality enters data science in different ways. The goal of causal discovery is to learn causal structure from observational data, an important but difficult problem. Several methods rely on testing for conditional independence. We prove that, statistically, this is fundamentally harder than testing for unconditional independence; solving it requires carefully chosen assumptions on the data generating process. In many practical problems, the focus may lie on prediction, and it is not necessary to solve (full) causal discovery. It might still be beneficial, however, to apply causality related ideas. In particular, interpolating between causality and predictability enables us to infer models that yield more robust prediction with respect to changes in the test set. We illustrate this idea for ODE based systems considering artificial and real data sets. The talk does not require any prior knowledge in causal inference. It contains joint work with Stefan Bauer, Niklas Pfister, and Rajen Shah.

### Workshop on Mathematics for Complex Data, May 30-31, 2018

The purpose of this workshop is to bring together researchers interested in the mathematics of complex data. There will be talks on mathematical methods for data analysis as well as presentations of complex data in applications.

See here for details.

### Two lectures by Scott Baden, Lawrence Berkeley National Laboratory and University of California, San Diego, May 2-3, 2018

Lecture 1: Room F11, Wednesday, May 2, 11.15-12.00

Lecture 2: Room F11, Thursday, May 3, 11.15-12.00

**Title: Scalable memory machines**

Abstract: Distributed memory computers provide scalable memory and - hopefully - scalable performance. Over two lectures, I'll present the principles

and practice of applying scalable memory machines to solve scientific problems and describe my current research in addressing the challenges

entailed in highly scalable computing.

Bio: Prof. Baden received his M.S and PhD in Computer Science from UC Berkeley in 1982 and 1987. He is also Adjunct Professor in the Department of Computer Science and Engineering at UCSD, where he was a faculty member for 27 years. His research interests are in high performance and scientific computation: domain specific translation, abstraction mechanisms, run times, and irregular problems. He has taught parallel programming at both the graduate and undergraduate level at UCSD and at the PDC Summer School.

### Seminar with Jeffrey Herschel Giansiracusa

Room F11, Friday Feb 16, 9:00-10:00.

**Title: A tour of some applications of persistent homology**

Abstract: I will give an overview of persistent homology - how it is constructed and how we use it as a tool in data analysis. Originally it was popularised as a way of producing a description of the shape of a data set, but more recently it has taken on an alternative role as a component in functional data analysis pipelines where each element in a data set represents a complicated geometric object and persistent homology provides a way of comparing the topology and geometry of different elements, and potentially feeding the topology directly into statistical learning methods. I will describe how this works in some examples.

### Seminar with Caroline Uhler, MIT

Room F11, Wednesday Feb 7, 13.15-14.15

**Title: Your dreams may come true with MTP2**

Abstract: We study probability distributions that are multivariate totally positive of order two (MTP2). Such distributions appear in various applications from ferromagnetism to Brownian tree models used in phylogenetics. We first describe some of the intriguing properties of such distributions with respect to conditional independence and graphical models. In the Gaussian setting, these translate into new statements about M-matrices that may be of independent interest to algebraists. We then consider the problem of nonparametric density estimation under MTP2. This requires us to develop new results in geometric combinatorics. In particular, we introduce bimonotone subdivisions of polytopes and show that the maximum likelihood estimator under MTP2 is a piecewise linear function that induces bimonotone subdivisions. In summary, MTP2 distributions not only have broad applications for data analysis, but also leads to interesting new problems in combinatorics, geometry, and algebra.

Carolone Uhler joined the MIT faculty ni 2015 as the Henry L. and Grace Doherty assistant professor in EECS and IDSS. She is a member of the LIDS, the Center for Statistics, Machine Learning at MIT, and the ORC. She holds a PhD in statistics from UC Berkely. Her research focuses on mathematical statistics and computational biology, in particular on graphical models, causal inference and algebraic statistics, and on applications to learning gene regulatory networks and the development of geometric models for the organization of chromosomes. |

### Opening workshop on Nov 17, 2017.

For information, see