Approximate Solution Methods to Optimal Control Problems via Dynamic Programming Models
Time: Mon 2021-12-20 10.00
Location: Q2, Malvinas väg 10, Stockholm
Subject area: Electrical Engineering Optimization and Systems Theory
Doctoral student: Yuchao Li , Reglerteknik
Opponent: Associate Professor Pontus Giselsson, Department of Automatic Control, Lund University
Supervisor: Professor Jonas Mårtensson, Reglerteknik; Professor Karl H. Johansson, Reglerteknik
Optimal control theory has a long history and broad applications. Motivated by the goal of obtaining insights through unification and taking advantage of the abundant capability to generate data, this thesis introduces some suboptimal schemes via abstract dynamic programming models.
As our first contribution, we consider deterministic infinite horizon optimal control problems with nonnegative stage costs. We draw inspiration from the learning model predictive control scheme designed for continuous dynamics and iterative tasks, and propose a rollout algorithm that relies on sampled data generated by some base policy. The proposed algorithm is based on value and policy iteration ideas. It applies to deterministic problems with arbitrary state and control spaces, and arbitrary dynamics. It admits extensions to problems with trajectory constraints, and a multiagent structure.
In addition, abstract dynamic programming models are used to analyze $\lambda$-policy iteration with randomization algorithms. In particular, we consider contractive models with infinite policies. We show that well-posedness of the $\lambda$-operator plays a central role in the algorithm. The operator is known to be well-posed for problems with finite states, but our analysis shows that it is also well-defined for the contractive models with infinite states. Similarly, the algorithm we analyze is known to converge for problems with finite policies, but we identify the conditions required to guarantee convergence with probability one when the policy space is infinite regardless of the number of states. Guided by the analysis, we exemplify a data-driven approximated implementation of the algorithm for estimation of optimal costs of constrained linear and nonlinear control problems. Numerical results indicate the potentials of this method in practice.