Human Speech and Communication

In 1951, while at MIT, Gunnar Fant famously stated that “we speak to be heard in order to be understood” in the seminal “Preliminaries to Speech Analysis: The Distinctive Features and their Correlates” together with Roman Jakobson and Morris Halle. Also in 1951, Gunnar returned to Sweden and founded KTH Speech, Music and Hearing (then the Speech Transmission Laboratory).

The question of how people communicate so effortlessly in speech was central to Gunnar, and using technology as a means to insight through objective analysis of human behaviour was at the heart of his research from the very beginning, when he created one of the world’s first speech synthesis systems not primarily to make applications of artificial speech possible, but to test his hypotheses of speech production. The motivation is captured by what was written on Richard Feynman’s blackboard at the time of his death: “What I cannot create, I do not understand”. KTH Speech, Music and Hearing has continuously attempted to understand and describe human communicative behaviours, from speaking to listening to understanding, through objective analysis rather than subjective, and the “analysis by synthesis” pioneered by Gunnar is still very much a part of our methodological toolbox.

A sizeable portion of the research at KTH Speech, Music and Hearing aims to describe, explain and model human behaviours, and to improve the technology and methodology that allows us to do so. The lab regularly captures human communicative behaviours in large, multi-modal data collections in support of research on several levels and in a range of disciplines. In fundamental research, it serves as a basis for investigations of human behaviour – an endeavour with strong ties to the humanities. In applied research, it provides the basis for models of humanlike behaviours for implementations of for example human-like dialogue systems and applications within social robotics. Add to this a strong tradition of iterative research, where models and analysis methods developed on one data set is used to improve the capture and analysis of the next data set. Recently, modern and more powerful analysis methods have also allowed us to take a new interest in data that already exist, but that was recorded for other purposes – so-called “found data”. There is a wealth of such data lying around. In Swedish archives alone, thousands if not millions of hours of speech data sits unused. If we can learn how to analyse this reliably, it would mean a huge step forward.

Analyses of what humans do when they talk to each other places high and particular demands on the availability both of analysis methods and of big data, and one of the greatest challenges right now is the lack of supporting infrastructures that allow access to the vast resources that exist, but are currently out of reach. KTH addresses this through a number of efforts and collaborations, both on the national and international level, that aim to create a Swedish speech technology infrastructure that allows us to maintain a leading position in speech research.

Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing
Last changed: Oct 14, 2019