The unifying research theme at the Department of Speech, Music and Hearing is communication and interaction between humans via speech and music. The department is engaged in a diverse set of multi-disciplinary research activities, with the main areas being speech communication, speech technology, multimodal interaction technologies, voice science and technical vocology, music informatics, and auditory perception.

Research in Speech Communication and Speech Technology
Research in Voice Science and Technical Vocology
Research in Music Informatics and Auditory Perception

Current projects at TMH

BabyRobot - Child-Robot Communication and Collaboration
Funding: EU

The main goal of the project is to create robots that analyze and track human behavior over time in the context of their surroundings using audio-visual monitoring in order to establish common ground and intention-reading capabilities. [more]

CityCrowd - Personalized spatially-aware dialogue systems
Funding: VR

A project exploring the intersection between spoken dialogue systems, geographic databases and crowd-sourcing. [more]

COIN - Co-adaptive human-robot interactive systems
Funding: SSF (Stiftelsen för Strategisk Forskning)

The main goal is to develop a systematic, bi-directional short- and long-term adaptive framework that yields safe, effective, efficient, and socially acceptable robot behaviors and human-robot interactions. [more]

CORDIAL - Coordination of Attention and Turn-taking in Situated Interaction
Funding: VR


EACare - Embodied Agent to support elderly mental wellbeing
Funding: SSF (Stiftelsen för Strategisk Forskning)

The main goal of the proposed multidisciplinary research is to develop a robot head with communicative skills capable of interacting with elderly people at their convenience. [more]

FACT - Factories of the Future: Human Robot Cooperative Systems
Funding: SSF (Stiftelsen för Strategisk Forskning)

The focus of FACT is on providing safe and flexible feedback in unforeseen situations, enhancement of human-robot cooperation and learning from experience. [more]

FonaDyn - Phonatory Dynamics and States
Funding: VR

The voice has several non-linear and context-dependent mechanisms that can give rise to distinct phonatory states. We submit that much of the observed variability in objective voice metrics results from the influence of such states, and will attempt to account for some of them, using a state-based analysis paradigm. [more]

IGLU - Interactive Grounded Language Understanding

Language is an ability that develops in young children through joint interaction with their caretakers and their physical environment. At this level, human language understanding could be referred as interpreting and expressing semantic concepts (e.g. objects, actions and relations) through what can be perceived (or inferred) from current context in the environment. Previous work... [more]

InkSynt - Incremental Text-To-Speech Conversion
Funding: VR

We will develop an incremental text-to-speech converter (TTS), which can be used in dynamically changing situations. In the project we will collect speech databases of how people read incrementally displayed text aloud, which will serve as the basis for the development of methods for incremental TTS with the correct prosody. We will... [more]

MirrorBot - Data-driven Modelling of Interaction Skills for Social Robots
Funding: SRA/KTH

A project aiming to use robot-mediated human interaction as a means of collecting data for modelling social signals in human robot interaction [more]

RoboLearn - Online learning of turn-taking behaviour in spoken human-robot interaction
Funding: VR

In this project, we will investigate how a robot’s turn-taking behaviour can be learned from experience by interacting with people. [more]

SpeakingUp - Making spoken cultural heritage accessible for research
Funding: RJ (Bank of Sweden Tercentenary Foundation)

The overall aim of the project is to make Sweden's archival treasure of recorded speech accessible for HS research. SpeakingUp is conducted by the Institute for Language and Folklore (ISOF), KTH and Digisam. [more]

TIG - Timing of intonation and gestures in spoken communication
Funding: RJ (Bank of Sweden Tercentenary Foundation)

The goal of the project is to understand timing relationships between intonation and gesture in spontaneous speech. This will be investigated through semi-automatic extraction of co-speech gestures from a large and varied dataset (audio, video, motion-capture), and analysis of function and synchronization of speech and gestures. [more]

VirtualRobot - Exploring situated interaction with social robots using augmented reality
Funding: SRA/KTH

In this project, we aim aim to explore the use of Augmented Reality (AR) to investigate the impact of multimodal behaviour (speech, facial expression, full-body motions, conversational formations) and embodiment on turn-taking and joint attention in human-robot interaction. [more]

WikiSpeech -
Funding: PTS - Post och Telestyrelsen

An open source project that will draw on crowdsourced contributions to make Wikipedia more accessible by adding text-to-speech synthesis that will enable users of the online encyclopedia to have portions of the text read out to them. [more]

Past projects

Top page top