Topics on Large Deviations in Artificial Intelligence
Time: Fri 2023-10-27 13.00
Subject area: Applied and Computational Mathematics, Mathematical Statistics
Doctoral student: Adam Lindhe , Matematisk statistik
Supervisor: Henrik Hult, Matematisk statistik; Jimmy Olsson, Matematisk statistik
Artificial intelligence has become one of the most important fields of study during the last decade. Applications include medical sciences, autonomous vehicles, finance and everyday life. Therefore the analysis of convergence and stability of these algorithms is of utmost importance. One way of analysing the stability and speed of convergence is by the large deviations theory. In large deviations theory, a rate function characterises the exponential rate of convergence of stochastic processes. For example, by evaluating the rate function for stochastic approximation algorithms for training neural networks, faster convergence can be achieved. This thesis consists of five papers that use ideas from large deviation theory to understand and improve specific machine-learning models.
Paper I proves that a stochastic approximation algorithm satisfies the large deviation principle with a specific rate function. This class of stochastic approximation contains many interesting learning algorithms, such as stochastic gradient descent, persistent contrastive divergence and the Wang-Landau algorithm.
Analysing the rate function from Paper I is not straightforward. In Paper II, we use tools from weak KAM theory to characterise the rate function. The rate function takes the form of a Lagrangian and can be evaluated by calculating the viscosity solution to the corresponding Hamilton-Jacobi equations. In Paper II, we also identify the projected Aubry set, a set of great importance when it comes to describing the viscosity solutions.
Papers III, IV and V all involve Variational autoencoders (VAE), a generative deep learning model with a latent space structure. In Paper III, we develop an evaluation metric for VAEs based on large deviation theory. The idea is to measure the difference between the induced empirical measure and the prior on the latent space. This is done by training an adversarial deep neural network and proving a modified version of Sanov's theorem.
Using the adversarial network from Paper III, we develop a stochastic interpolation algorithm for VAEs in Paper IV. The interpolation uses bridge processes and the adversarial network to construct paths that respects both the prior and generate high-quality interpolation paths.
Finally, in Paper V, a clustering algorithm is introduced. The VAE induces a probability distribution on the data space, and in this paper, we introduce an algorithm to estimate the gradient of the distribution. This leads to a stochastic approximation algorithm that gathers data in clusters.