Sergi Andreu: Neural synthesis of sound effects using flow-based deep generative models
Time: Fri 2022-09-23 13.15
Location: 3721, Lindstedtsvägen 25
Respondent: Sergi Andreu
Opponent: Sirak Ghebreamlak
Supervisor: Elias Jarlebring, Mónica Villanueva, Konrad Tollmar
Generating diverse sound effects for video games is a consuming task that grows with the size and complexity of the games. We adopt WaveFlow, a flow-based deep generative model intended for speech synthesis, to generate variations of explosion sounds, given lower-dimensional conditioners in the form of mel spectrograms. This work suggests that it is possible to use flow-based models to generate high-quality raw audio waveforms of sound effects, learning from small datasets, and it proposes metrics for evaluating the quality of the generated audio. It is, to the best of our knowledge, the first adaptation of these techniques to sound effects generation.