Perception of speaker stance - using spontaneous speech synthesis to explore the contribution of prosody, context and speaker.
Speaker stance is broadly defined as the speaker’s attitude towards what they are saying. In this project we aim to establish and pioneer a new research methodology to study the perceptual contributions of prosodic features, speaker characteristics and contextual embedding, using modifiable datadriven speech synthesis.
With our flexible TTS built on real conversational data we are able to synthesise natural speech examples of selected prosodic constructions with different linguistic content, different speaker characteristics and also modified voice quality.
The proposed method aims to conduct comparative perceptual experiments with spontaneous speech synthesis systematically varying these various speech features and measure their direct and combined perceptual impact. Our hope is that the findings from our perceptual experiments will be directly applicable in areas such as language learning, special education and public speaker training. The tools we use and develop in this project to create the stimuli, will be made available to the scientific community, so that the established methodology will be employable in other areas of speech research.
We will perform three sets of studies on perception of speaker stance.
Prosody (same words different prosody) - investigating how the combination of prosody and linguistic content affect the perception
Context (different words, same prosody) - investigating how situational and conversational contexts affect the perception
Speaker (same words and prosody different speaker) - investigating listener bias based on speaker characteristics such as accent, age and gender.
This project is part of our recent years' efforts in developing spontaneous conversational speech synthesis, for examples please visit www.speech.kth.se/tts-demos .
Duration: 2020 - 2025