Reinforcement Learning Endowed Robot Planning under Spatiotemporal Logic Specifications

Time: Mon 2019-12-09 10.00

Location: V3, Teknikringen 72, KTH Campus, Stockholm (English)

Subject area: Electrical Engineering

Doctoral student: Peter Varnai , Reglerteknik

Opponent: Assistant Professor Ufuk Topcu, The University of Texas at Austin

Supervisor: Professor Dimos V. Dimarogonas, Reglerteknik


Recent advances in artificial intelligence are producing fascinating results in the field of computer science. Motivated by these successes, the desire to transfer and implement learning methods on real-life systems is growing as well. The increased level of autonomy and intelligence of the resulting systems in carrying out complex tasks can be expected to revolutionize both the industry and our everyday lives. This thesis takes a step towards this goal by studying reinforcement learning methods for solving optimal control problems with task satisfaction constraints. More specifically, spatiotemporal tasks given in the expressive language of signal temporal logic are considered.

We begin by introducing our proposed solution to the task constrained optimal control problem, which is based on blending traditional control methods with more recent, data-driven approaches. We adopt the paradigm that the two approaches should be considered as endpoints of a continuous spectrum, and incorporate partial knowledge of system dynamics into the learning process in the form of guidance controllers. These guidance controllers aid in enforcing the task satisfaction constraint, allowing the system to explore towards finding optimal trajectories in a more sample-efficient manner. The proposed solution algorithm is termed guided policy improvement with path integrals (G-PI2). We also propose a framework for deriving effective guidance controllers, and the importance of this guidance is illustrated through a simulation case study.

The thesis also considers a diverse range of enhancements to the developed G-PI2 algorithm. First, the effectiveness of the guidance laws is increased by continuously updating their parameters throughout the learning process using so-called funnel adaptation. Second, we explore a learning framework for gathering and storing experiences gained from previously solved problems in order to efficiently tackle changes in initial conditions or task specifications in future missions. Finally, we look at how so-called robustness metrics, which quantify the extent of task satisfaction for signal temporal logic, can be explicitly defined in order to aid the learning process towards finding task satisfying trajectories. The multidisciplinary nature of the examined task constrained optimal control problem offers a broad range of additional research directions to consider in future work, which are discussed in detail as well.