Scalable Reinforcement Learning for Formation Control with Collision Avoidance
Tid: Fr 2022-12-16 kl 09.00 - 10.00
Plats: Harry Nyquist
Respondent: Andreu Matoses Gimenez , DCS/Reglerteknik
Opponent: Gregorio Marchesini
Handledare: Ingvar Max Ziemann
Examinator: Alexandre Proutière
In the last decades, significant theoretical advances have been made on the field of distributed multi-agent control theory. One of the most common systems that can be modeled as multi-agent systems are the so called formation control problems, in which a network of mobile agents is controlled to move towards a desired final formation.
This thesis presents a scalable and localized reinforcement learning approach to a traditional multi-agent formation control problem, with collision avoidance. A scalable reinforcement learning advantage actor critic algorithm is presented, based on previous work in the literature. Sub-optimal bounds are calculated for the accumulated reward and policy gradient localized approximations. The algorithm is tested on a two dimensional setting, with a network of mobile agents following simple integrator dynamics and stochastic localized policies. Neural networks are used to approximate the continuous value functions and policies. The formation control with collisions avoidance formulation and the algorithm presented show good scalability properties, with a polynomial increase in the number of function approximations parameters with number of agents. The reduced number of parameters decreases learning time for bigger networks, although the efficiency of computation is decreased compared to state of the art machine learning implementations. The policies obtained archive probably safe trajectories however the lack of dynamic model makes it impossible to guarantee safety.