Skip to main content
To KTH's start page

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Vaddadi, B., Axelsson, A., Skantze, G. (2026). The Role of Social Robots in Autonomous Public Transport. In Transport Transitions: Advancing Sustainable and Inclusive Mobility: Proceedings of the 10th TRA Conference, 2024, Dublin, Ireland - Volume 1: Safe and Equitable Transport. (pp. 711-716). Springer Nature.
[2]
Ekström, A. G., Karakostis, F. A., Snyder, W. D. & Moran, S. (2025). Rethinking Hominin Air Sac Loss in Light of Phylogenetically Meaningful Evidence. Evolutionary anthropology (Print), 34(3).
[3]
Moell, B., Aronsson, F. S. & Akbar, S. (2025). Medical reasoning in LLMs : an in-depth analysis of DeepSeek R1. Frontiers in Artificial Intelligence, 8.
[4]
Leite, I., Ahlberg, W., Pereira, A., Sestini, A., Gisslen, L., Tollmar, K. (2025). A Call for Deeper Collaboration Between Robotics and Game Development. In Proceedings of the IEEE 2025 Conference on Games, CoG 2025. Institute of Electrical and Electronics Engineers (IEEE).
[5]
Walker, R. S., Fleischer, M., Sundberg, J., Bieber, M., Zabel, H. & Mürbe, D. (2025). Retrospective longitudinal analysis of spectral features reveals divergent vocal development patterns for treble and non-treble singers. Journal of the Acoustical Society of America, 158(3), 1989-1998.
[6]
Jacka, R., Peña, P. R., Leonard, S. J., Székely, É., Cowan, B. R. (2025). Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking. In CUI 2025 - Proceedings of the 2025 ACM Conference on Conversational User Interfaces. Association for Computing Machinery (ACM).
[7]
Friedrichs, D., Ekström, A. G., Nolan, F., Moran, S. & Rosen, S. (2025). Static spectral cues serve as perceptual anchors in vowel recognition across a broad range of fundamental frequencies. Journal of the Acoustical Society of America, 158(2), 1560-1572.
[9]
Ekström, A. G., Tennie, C., Moran, S. & Everett, C. (2025). The Phoneme as a Cognitive Tool. Topics in Cognitive Science.
[10]
Moëll, B. & Sand Aronsson, F. (2025). Journaling with large language models : a novel UX paradigm for AI-driven personal health management. Frontiers in Artificial Intelligence, 8.
[11]
Grouwels, J., Jonason, N., Sturm, B. (2025). Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings. In GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference. (pp. 1362-1370). Association for Computing Machinery (ACM).
[12]
Cavalcanti, J. C., Skantze, G. (2025). "Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking. In Proceedings of the Interspeech 2025. Rotterdam, The Netherlands: ISCA.
[14]
Mehta, S. (2025). Probabilistic Speech & Motion Synthesis : Towards More Expressive and Multimodal Generative Models (Doctoral thesis , KTH Royal Institute of Technology, TRITA-EECS-AVL 2025:76). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-368342.
[15]
Mehta, S., Gamper, H., Jojic, N. (2025). Make Some Noise : Towards LLM audio reasoning and generation using sound tokens. In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 1-5). Institute of Electrical and Electronics Engineers (IEEE).
[16]
Best, P., Araya-Salas, M., Ekström, A. G., Freitas, B., Jensen, F. H., Kershenbaum, A. ... Marxer, R. (2025). Bioacoustic fundamental frequency estimation : a cross-species dataset and deep learning baseline. Bioacoustics, 34(4), 419-446.
[17]
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race : On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194.
[18]
Torubarova, E. (2025). Brain-Focused Multimodal Approach for Studying Conversational Engagement in HRI. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1894-1896). Institute of Electrical and Electronics Engineers (IEEE).
[19]
Torubarova, E., Arvidsson, C., Berrebi, J., Uddén, J., Abelho Pereira, A. T. (2025). NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 849-858). Institute of Electrical and Electronics Engineers (IEEE).
[20]
Tuttösí, P., Mehta, S., Syvenky, Z., Burkanova, B., Hfsafsti, M., Wang, Y., Yeung, H. H., Henter, G. E., Aucouturier, J. J., Lim, A. (2025). Take a Look, it's in a Book, a Reading Robot. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1803-1805). Institute of Electrical and Electronics Engineers (IEEE).
[21]
Irfan, B., Churamani, N., Zhao, M., Ayub, A., Rossi, S. (2025). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI) : Overcoming Inequalities with Adaptation. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1970-1972). Institute of Electrical and Electronics Engineers (IEEE).
[22]
Skantze, G., Irfan, B. (2025). Applying General Turn-Taking Models to Conversational Human-Robot Interaction. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 859-868). Institute of Electrical and Electronics Engineers (IEEE).
[23]
Reimann, M. M., Hindriks, K. V., Kunneman, F. A., Oertel, C., Skantze, G., Leite, I. (2025). What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 708-716). Institute of Electrical and Electronics Engineers (IEEE).
[24]
Irfan, B., Skantze, G. (2025). Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1357-1362). Institute of Electrical and Electronics Engineers (IEEE).
[25]
Janssens, R., Pereira, A., Skantze, G., Irfan, B., Belpaeme, T. (2025). Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs. In HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE).
[26]
Cros Vila, L., Sturm, B. (2025). (Mis)Communicating with our AI Systems. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery (ACM).
[27]
Kamelabad, A. M., Inoue, E., Skantze, G. (2025). Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning. In Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 829-838).
[28]
Gonzalez Oliveras, P., Engwall, O. & Wilde, A. (2025). Social Educational Robotics and Learning Analytics : A Scoping Review of an Emerging Field. International Journal of Social Robotics.
[29]
Cai, H. & Ternström, S. (2025). A WaveNet-based model for predicting the electroglottographic signal from the acoustic voice signal. Journal of the Acoustical Society of America, 157(4), 3033-3044.
[30]
Marcinek, L., Beskow, J., Gustafsson, J. (2025). A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI. In Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (pp. 290-297). Springer Nature.
[31]
Mishra, C., Skantze, G., Hagoort, P., Verdonschot, R. (2025). Perception of Emotions in Human and Robot Faces : Is the Eye Region Enough?. In Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (pp. 290-303). Springer Nature.
[32]
Herbst, C. T., Tokuda, I. T., Nishimura, T., Ternström, S., Ossio, V., Levy, M. ... Dunn, J. C. (2025). ‘Monkey yodels’—frequency jumps in New World monkey vocalizations greatly surpass human vocal register transitions. Philosophical Transactions of the Royal Society of London. Biological Sciences, 380(1923).
[33]
[34]
Borg, A., Georg, C., Jobs, B., Huss, V., Waldenlind, K., Ruiz, M. ... Parodis, I. (2025). Virtual Patient Simulations Using Social Robotics Combined With Large Language Models for Clinical Reasoning Training in Medical Education: Mixed Methods Study. Journal of Medical Internet Research, 27.
[35]
Cai, H. (2025). Mapping voice quality in normal, pathological and synthetic voices (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:25). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360211.
[36]
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025). Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music. Journal of New Music Research, 1-17.
[37]
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025). Classification of voice quality using neck-surface acceleration : Comparison with glottal flow and radiated sound. Journal of Voice, 39(1), 10-24.
[38]
Székely, É., Hope, M. (2024). An inclusive approach to creating a palette of synthetic voices for gender diversity. In Proc. Interspeech 2024. (pp. 3070-3074).
[39]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024). A critical survey of research in music genre recognition. In Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024. (pp. 745-782). International Society for Music Information Retrieval.
[40]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2024). A Framework for Phoneme-Level Pronunciation Assessment Using CTC. In Interspeech 2024. (pp. 302-306). International Speech Communication Association.
[41]
Edlund, J., Tånnander, C., Le Maguer, S., Wagner, P. (2024). Assessing the impact of contextual framing on subjective TTS quality. In Interspeech 2024. (pp. 1205-1209). International Speech Communication Association.
[42]
Székely, É., Hope, M. (2024). An inclusive approach to creating a palette of synthetic voices for gender diversity. In Interspeech 2024. (pp. 3070-3074). International Speech Communication Association.
[43]
[44]
Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024). A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams. EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).
[45]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024). A Critical Survey of Research in Music Genre Recognition. In Proc. International Society for Music Information Retrieval Conference. ISMIR.
[46]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024). AI Music Studies : Preparing for the Coming Flood. In Proceedings of AI Music Creativity..
[47]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024). Applying textual inversion to control and personalize text-to-music models. In Proc. 15th Int. Workshop on Machine Learning and Music..
[48]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024). An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors. In Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[49]
Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.
[50]
Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
Full list in the KTH publications portal