Skip to main content
To KTH's start page To KTH's start page

Connected - Context-aware speech synthesis for conversational AI

Speakers use a variety of strategies to continuously adjust their speech delivery in response to the situational context of the conversation. In this project the aim is to develop a context-aware conversational speech synthesizer that allows for implicit control over the manner of speaking to cater for the communicative needs of different interactive scenarios.

We aim to connect the synthesis process to its own previous outputs to create authentic stretch-es of speech, to connect it with the speech and cues from the conversation partner and finally to connect it to the required conversation style and topic at hand. We'll do this with the following contributions:

  • A spontaneous speech corpus of situation-dependent speech phenomena recorded in a range of interaction scenarios for training context-aware spontaneous speech synthesis
  • A novel architecture that allows for continuous speech as input which will be employed to develop two synthesizers:
    • A spontaneous speech synthesizer uses breath and disfluencies to implicitly control the manner of speaking,
    • A conversational speech synthesizer that is able to align its manner of speaking to the conversation partner.
  • Quantitative and qualitative evaluations that verify the benefits of the methods and systems listed above in conversational human-machine interactions in different scenarios.

This project is part of our recent years' efforts in developing spontaneous conversational speech synthesis, for examples please visit this page , or watch the demo below.



VR (2019-05003)


2020 → 2025