Skip to main content
To KTH's start page

Exploring the Expressive Space of an Articulatory Vocal Model using Quality-Diversity Optimization with Multimodal Embeddings

Joris Grouwels's 50% Seminar

Time: Thu 2025-01-23 10.00 - 11.00

Location: Lindstedtsvägen 24, floor 5, room no. 522 (Fantum)

Video link: Zoom Link

Contact:

Export to calendar

Abstract

Recording Notice

We plan to record this seminar.

The main part of the seminar consists of work done in collaboration with Nicolas Jonason and my supervisor Bob Sturm: Knowing which sounds can be produced by a simulated vocal model and how they are connected to its articulatory behavior, is not trivial. Being able to map this out can be interesting for applications that make use of the extended capabilities of a voice, e.g. singing or vocal imitations. I will present a system that achieves this for a state-of-the-art articulatory vocal model (VocalTractLab) by combining it with a recent Quality-Diversity algorithm (CMA-MAE) and auditory features obtained through a multi-modal pretrained model (CLAP). The text-capabilities of the latter make it possible to steer the exploration through a text prompt. I will show that the method explores faster and covers more of the measure space than the random sampling baseline and provide some listening examples. Apart from the above, I will sketch my research plans for the latter part of my PhD. Prof. Arvind Kumar will serve as a discussant for this seminar.

Acknowledgement

This work is an outcome of a project that has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (MUSAiC, Grant agreement No. 864189). The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.