Skip to main content

Generation and evaluation of co-speech gestures for embodied conversational agents

Time: Thu 2021-02-25 10.30

Lecturer: Taras Kucherenko

Location: zoom

Speaker: Taras Kucherenko

Seminar type: 80% seminar

When:  Thursday, 25 February 10:30-12:00


Chairman:  Danica Kragic


Humans use non-verbal behaviors to signal their intent, emotions, and attitudes in human-human interactions. Embodied conversational agents (ECA), therefore, need this ability as well in order to make interaction pleasant and efficient. An important part of non-verbal communication is gesticulation: co-speech hand gestures communicate a large share of non-verbal content. The task of hand gesture generation has been studied extensively over the last few decades. Initially, most of the methods were rule-based, but recent state-of-the-art methods are data-driven, and we continue this line of research.

In this talk, I will present the two components of my thesis: 1) machine-learning models for hand gesture generation; 2) evaluation and benchmarking of the state-of-the-art in the field of gesture generation.

1) We have proposed three different gesture generation models. One deterministic model uses only audio. Another deterministic model uses both audio and text. And a (work-in-progress) probabilistic model that uses both audio and text and aims to generate meaningful gestures. The resulting gestures can be applied to both virtual agents and humanoid robots.

2) Individual research efforts in the field of gesture generation are difficult to compare: there are no established benchmarks. To address this situation, we launched a GENEA gesture-generation challenge. We have also investigated if online participants are as attentive as offline participants. Finally, we also developed a system that integrates co-speech gesture generation models into an interactive embodied conversational agent in real-time. It is intended to facilitate the evaluation of modern gesture generation models in interaction.