Skip to main content

Best paper award at ICMI 2020!

Published Oct 07, 2021

Gesticulator: A Framework for Semantically-aware Speech-driven Gesture Generation. Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Henter, Simon Alexandersson, Iolanda Leite, Hedvig Kjellström

During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoustically-linked beat gestures or semantically-linked gesticulation (e.g., raising a hand when saying "high''): they cannot appropriately learn to generate both gesture types. We present a model designed to produce arbitrary beat and semantic gestures together. Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output. The resulting gestures can be applied to both virtual agents and humanoid robots. Subjective and objective evaluations confirm the success of our approach. The code and video are available at the project page svito-zar.github.io/gesticulator .

Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing
Last changed: Oct 07, 2021