Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
Time: Fri 2021-09-17 14.00
Subject area: Computer Science
Doctoral student: Shahbaz Abdul Khader , Robotik, perception och lärande, RPL
Opponent: Associate Professor Ludovic Righetti, New York University
Supervisor: Professor Danica Kragic, Robotik, perception och lärande, RPL, Centrum för autonoma system, CAS
Autonomous robots are expected to make a greater presence in the homes and workplaces of human beings. Unlike their industrial counterparts, autonomous robots have to deal with a great deal of uncertainty and lack of structure in their environment. A remarkable aspect of performing manipulation in such a scenario is the possibility of physical contact between the robot and the environment. Therefore, not unlike human manipulation, robotic manipulation has to manage contacts, both expected and unexpected, that are often characterized by complex interaction dynamics.
Skill learning has emerged as a promising approach for robots to acquire rich motion generation capabilities. In skill learning, data driven methods are used to learn reactive control policies that map states to actions. Such an approach is appealing because a sufficiently expressive policy can almost instantaneously generate appropriate control actions without the need for computationally expensive search operations. Although reinforcement learning (RL) is a natural framework for skill learning, its practical application is limited for a number of reasons. Arguably, the two main reasons are the lack of guaranteed control stability and poor data-efficiency. While control stability is necessary for ensuring safety and predictability, data-efficiency is required for achieving realistic training times. In this thesis, solutions are sought for these two issues in the context of contact-rich manipulation.
First, this thesis addresses the problem of control stability. Despite unknown interaction dynamics during contact, skill learning with stability guarantee is formulated as a model-free RL problem. The thesis proposes multiple solutions for parameterizing stability-aware policies. Some policy parameterizations are partly or almost wholly deep neural networks. This is followed by policy search solutions that preserve stability during random exploration, if required. In one case, a novel evolution strategies-based policy search method is introduced. It is shown, with the help of real robot experiments, that Lyapunov stability is both possible and beneficial for RL-based skill learning.
Second, this thesis addresses the issue of data-efficiency. Although data-efficiency is targeted by formulating skill learning as a model-based RL problem, only the model learning part is addressed. In addition to benefiting from the data-efficiency and uncertainty representation of the Gaussian process, this thesis further investigates the benefits of adopting the structure of hybrid automata for learning forward dynamics models. The method also includes an algorithm for predicting long-term trajectory distributions that can represent discontinuities and multiple modes. The proposed method is shown to be more data-efficient than some state-of-the-art methods.