Online learning of turn-taking behaviour in spoken human-robot interaction

In this project, we will investigate how a robot’s turn-taking behaviour can be learned from experience by interacting with people.

A vision for the future are social robots that will appear in supermarkets, schools, manufacturing industry, and the homes of people. The success of this development will depend on how well humans can communicate with these robots, and the most natural way of interacting with them is likely to be spoken face-to-face interaction. However, the interaction possibilities offered by current social robots are still very limited and very different from how we are used to communicate with other humans. One of the reasons for these shortcomings is that current systems rely on an overly simplistic model of the turn-taking. Most conversational systems of today use a simple silence threshold to decide when the system should respond. Silence, however, is not a good indicator: sometimes there is silence but no turn-change is intended (e.g., hesitations), sometimes there isn’t silence, but the turn should change without any noticeable gap. In human-human interaction, the speakers continuously monitor several different cues – such as syntax, semantics, intonation, gaze, and body motion – to detect relevant places to take the turn. Failing to model this correctly may result in the robot interrupting the user, or in delayed responses. In previous studies, we have shown how machine learning can be used to allow the system to detect these cues, giving the system a more human-like turn-taking behaviour. However, this approach relies on manual annotation of recorded data. In this project, we will investigate how a robot’s turn-taking behaviour can be learned from experience by interacting with people, without any need for annotated data. By monitoring how the robot’s turn-taking behaviour results in either smooth turn-taking or in interaction problems (such as overlapping speech or long gaps), the robot can get automatic feedback on its behaviour and thereby train the turn-taking model automatically in an unsupervised (or implicitly supervised) fashion, without needing any manual annotation. If several humans are interacting with the robot, it should also be possible to further improve the turn-taking model by observing where the humans take the turn when talking to each other. The models will be developed and evaluated in a setting where a robot solves a collaborative problem together with two humans. We have already built such a system (albeit with a simplistic model of turn-taking), and exhibited it at the Swedish National Museum of Science and Technology, which means that we have large set of data that we can train the initial models on.


Gabriel Skantze (Project leader)
Martin Johansson

Funding: VR (2015-03763)

Duration: 2016 - 2018

Related publications:

Till sidans topp