The perception of distorted speech

Time: Fri 2021-05-28 15.15

Location: Zoom

Lecturer: Prof. Martin Cooke

No theory of speech perception is complete without an explanation of how
listeners are able handle severely degraded forms of speech. Starting
with a brief overview of a century of research which has seen the
development of many types of distorted speech, followed by some
anecdotal evidence that automatic speech recognisers still have some way
to go to match listeners' performance in this area, I will describe the
outcome of two recent perceptual studies into different aspects of
distorted speech. The first makes use of 'sculpted speech', a new form
of distortion that is created by passing an arbitrary signal through a
time-frequency mask representing the target utterance. This study
examines the extent to which the mask alone supports speech perception,
and considers what happens when the generating signal is incongruent
with the target speech. The second study investigates the detailed time
course of a listener's response to different varieties of distorted
speech, addressing the question of just how rapidly we adapt to
previously unseen forms of speech. In parallel, I will mention the
outcome of web-based replications of both experiments, and discuss the
implications for carrying out speech perception experiments outside
traditional laboratory settings.

Martin Cooke is Ikerbasque Research Professor in the Language and Speech
Lab at the University of the Basque Country, Spain. After starting his
career in the UK National Physical Laboratory, he worked at the
University of Sheffield for 26 years before taking up his current
position. His research has focused on analysing the computational
auditory scene, investigating human speech perception and devising
algorithms for robust automatic speech recognition. His interest in
these domains also includes the effects of noise on speech production,
as well as second language listening and acquisition models. He
currently coordinates the EU Marie Curie Network ENRICH which focuses on
listening effort.