Skip to main content
To KTH's start page To KTH's start page

Novel Hessian-Based Algorithms for Unconstrained Optimization

Time: Thu 2024-05-23 15.00

Location: F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm

Language: English

Subject area: Applied and Computational Mathematics, Optimization and Systems Theory

Doctoral student: Erik Berglund , Reglerteknik

Opponent: Professor Frank Edward Curtis, Lehigh University

Supervisor: Mikael Johansson, Reglerteknik

Export to calendar

QC 20240515

Abstract

There are several benefits of taking the Hessian of the objective function into account when designing optimization algorithms. Compared to using strictly gradient-based algorithms, Hessian-based algorithms usually require fewer iterations to converge. They are generally less sensitive to tuning of parameters and can better handle ill-conditioned problems. Yet, they are not universally used, due to there being several challenges associated with adapting them to various challenging settings. This thesis deals with Hessian-based optimization algorithms for large-scale, stochastic, distributed, and zeroth-order problems. For the large-scale setting, we contribute with a new way of deriving limited memory quasi-Newton methods, which we show can achieve better results than traditional limited memory quasi-Newton methods with less memory for some logistic and linear regression problems. For the stochastic setting, we relax the secant condition used in traditional quasi-Newton methods and derive a novel quasi-Newton update that always preserves positive definiteness. Based on this, we develop an algorithm that exhibits linear convergence toward a neighborhood of the optimal solution, even if gradient and function evaluations are subject to bounded perturbations. For the distributed setting, we contribute with two different projects. Firstly, we perform an analysis of how the error of a Newton-step is affected by the condition number and the number of iterations of a consensus-algorithm based on averaging. We show that the number of iterations needed to solve a quadratic problem with relative error less than ε grows logarithmically with 1/ε and also with the condition number of the Hessian of the centralized problem. Secondly, we consider how Hessians and Hessian approximations can be used to compensate for communication delays in asynchronous implementations of the Incremental Aggregated Gradient (IAG) algorithm. We provide a general convergence theorem that can be used to analyze delay compensation using various Hessian approximations, apply it to the previously proposed Curvature-Aided IAG (CIAG), and propose delay compensation with some cheaper Hessian approximations that nevertheless outperform IAG without delay compensation. For the zeroth order setting, we exploit the fact that a finite difference estimate of the directional derivative works as an approximate sketching technique and use this to propose a zeroth order extension of a sketched Newton method that has been developed to solve large-scale problems. With the extension of this method to the zeroth order setting, we address the combined challenge of large-scale and zeroth order problems. 

urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-346438