Skip to main content
To KTH's start page To KTH's start page

Sergi Andreu: Neural synthesis of sound effects using flow-based deep generative models

Time: Fri 2022-09-23 13.15

Location: 3721, Lindstedtsvägen 25

Respondent: Sergi Andreu

Opponent: Sirak Ghebreamlak

Supervisor: Elias Jarlebring, Mónica Villanueva, Konrad Tollmar

Export to calendar


Generating diverse sound effects for video games is a consuming task that grows with the size and complexity of the games. We adopt WaveFlow, a flow-based deep generative model intended for speech synthesis, to generate variations of explosion sounds, given lower-dimensional conditioners in the form of mel spectrograms. This work suggests that it is possible to use flow-based models to generate high-quality raw audio waveforms of sound effects, learning from small datasets, and it proposes metrics for evaluating the quality of the generated audio. It is, to the best of our knowledge, the first adaptation of these techniques to sound effects generation.