Speech, Music and Hearing (TMH)

Research at the Division of Speech, Music and Hearing (TMH) is truly multi-disciplinary including linguistics, phonetics, auditory perception, vision and experimental psychology. Rooted in an engineering modelling approach, our research forms a solid base for developing multimodal human-computer interaction systems in which speech, music, sound and gestures combine to create human-like communication.

Research Area

The division is part of the Department of Intelligent Systems at the school of Electrical Engineering and Computer Science .

Conversational Systems

Human Speech and Communication

Music Informatics and Auditory Perception

Speech and Language Technologies

Social Robotics

Voice Science and Technical Vocology

Latest Publications

[1]

Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024). Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data. In 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (pp. 219-224). Association for Computational Linguistics (ACL).

[2]

Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024). MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (pp. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).

[3]

Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024). Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music. In Proceedings New Interfaces for Musical Expression NIME’24..

[4]

Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024). DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input. In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..

[5]

Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024). Prosodic characteristics of English-accented Swedish neural TTS. In Proceedings of Speech Prosody 2024. (pp. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.

Full list in the KTH publications portal

At TMH we regularly hold seminars from talented minds and bright researchers. You can check our calendar for the upcoming seminars or register as a speaker.

TMH Seminar Speaker Registration

Events

No up-to-date calendar events right now.

https://www.kth.se/is/tmh/calendar

News

Erik Ekstedt and Gabriel Skantze from the Division of Speech, Music and Hearing

How to predict a conversation
26 Sep 2022

The SIGIDAL best paper award went to Erik Ekstedt and Gabriel Skantze from Speech, Music and Hearing (TMH). Their model learns to predict what will happen in the next two seconds of the conversation. ...
Research on generating a faster iteration and a more personal voice for digital assistants
24 Jan 2022

Shivam Mehta, doctoral student at the Division of Speech, Music and Hearing, congratulations on winning the Poster exhibition at the EECS Winter Conference.
ICMI 2021 Best Paper Award Nomination!
21 Oct 2021
TMH gets Jury award at IVA Gala 2021
7 Oct 2021
TMH gets Honourable Mention at IVA 2021!
7 Oct 2021

Studies

Research

Collaboration

About KTH

Library

Speech, Music and Hearing (TMH)

Research Area

Latest Publications

Events

News

Contact