Skip to main content

Scalable Methods for Developing Interlocutor-aware Embodied Conversational Agents

Data Collection, Behavior Modeling, and Evaluation Methods

Time: Fri 2022-03-25 14.00

Location: U1, Brinellvägen 26, Stockholm

Video link:

Language: English

Subject area: Speech and Music Communication

Doctoral student: Patrik Jonell , Tal, musik och hörsel, TMH

Opponent: Professor Justine Cassell,

Supervisor: Jonas Beskow, Tal, musik och hörsel, TMH; Professor Joakim Gustafsson, Tal, musik och hörsel, TMH; Assistant Professor Gustav Eje Henter, Tal, musik och hörsel, TMH

QC 20220307


This work presents several methods, tools, and experiments that contribute to the development of interlocutor-aware Embodied Conversational Agents (ECAs). Interlocutor-aware ECAs take the interlocutor's behavior into consideration when generating their own non-verbal behaviors. This thesis targets the development of such adaptive ECAs by identifying and contributing to three important and related topics:

1) Data collection methods are presented, both for large scale crowdsourced data collection and in-lab data collection with a large number of sensors in a clinical setting. Experiments show that experts deemed dialog data collected using a crowdsourcing method to be better for dialog generation purposes than dialog data from other commonly used sources. 2) Methods for behavior modeling are presented, where machine learning models are used to generate facial gestures for ECAs. Both methods for single speaker and interlocutor-aware generation are presented. 3) Evaluation methods are explored and both third-party evaluation of generated gestures and interaction experiments of interlocutor-aware gestures generation are being discussed. For example, an experiment is carried out investigating the social influence of a mimicking social robot. Furthermore, a method for more efficient perceptual experiments is presented. This method is validated by replicating a previously conducted perceptual experiment on virtual agents, and shows that the results obtained using this new method provide similar insights (in fact, it provided more insights) into the data, simultaneously being more efficient in terms of time evaluators needed to spend participating in the experiment. A second study compared the difference between performing subjective evaluations of generated gestures in the lab vs. using crowdsourcing, and showed no difference between the two settings. A special focus in this thesis is given to using scalable methods, which allows for being able to efficiently and rapidly collect interaction data from a broad range of people and efficiently evaluate results produced by the machine learning methods. This in turn allows for fast iteration when developing interlocutor-aware ECAs behaviors.