TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Vaddadi, B., Axelsson, A., Skantze, G. (2026).
The Role of Social Robots in Autonomous Public Transport.
I Transport Transitions: Advancing Sustainable and Inclusive Mobility: Proceedings of the 10th TRA Conference, 2024, Dublin, Ireland - Volume 1: Safe and Equitable Transport. (s. 711-716). Springer Nature.
[2]
Moell, B., Aronsson, F. S. & Akbar, S. (2025).
Medical reasoning in LLMs : an in-depth analysis of DeepSeek R1.
Frontiers in Artificial Intelligence, 8.
[3]
Leite, I., Ahlberg, W., Pereira, A., Sestini, A., Gisslen, L., Tollmar, K. (2025).
A Call for Deeper Collaboration Between Robotics and Game Development.
I Proceedings of the IEEE 2025 Conference on Games, CoG 2025. Institute of Electrical and Electronics Engineers (IEEE).
[4]
Walker, R. S., Fleischer, M., Sundberg, J., Bieber, M., Zabel, H. & Mürbe, D. (2025).
Retrospective longitudinal analysis of spectral features reveals divergent vocal development patterns for treble and non-treble singers.
Journal of the Acoustical Society of America, 158(3), 1989-1998.
[5]
Jacka, R., Peña, P. R., Leonard, S. J., Székely, É., Cowan, B. R. (2025).
Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking.
I CUI 2025 - Proceedings of the 2025 ACM Conference on Conversational User Interfaces. Association for Computing Machinery (ACM).
[6]
Friedrichs, D., Ekström, A. G., Nolan, F., Moran, S. & Rosen, S. (2025).
Static spectral cues serve as perceptual anchors in vowel recognition across a broad range of fundamental frequencies.
Journal of the Acoustical Society of America, 158(2), 1560-1572.
[7]
Zellers, M., Gorisch, J. & House, D. (2025).
Temporal relationships between speech and hand gestures in the vicinity of potential turn boundaries in German and Swedish conversation.
Language and Cognition, 17.
[8]
Moell, B. & Sand Aronsson, F. (2025).
Journaling with large language models: a novel UX paradigm for AI-driven personal health management.
Frontiers in Artificial Intelligence, 8.
[9]
Ekström, A. G., Tennie, C., Moran, S. & Everett, C. (2025).
The Phoneme as a Cognitive Tool.
Topics in Cognitive Science.
[10]
Moëll, B. & Sand Aronsson, F. (2025).
Journaling with large language models : a novel UX paradigm for AI-driven personal health management.
Frontiers in Artificial Intelligence, 8.
[11]
Grouwels, J., Jonason, N., Sturm, B. (2025).
Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings.
I GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference. (s. 1362-1370). Association for Computing Machinery (ACM).
[12]
Cavalcanti, J. C., Skantze, G. (2025).
"Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking.
I Proceedings of the Interspeech 2025. Rotterdam, The Netherlands: ISCA.
[13]
Moëll, B. & Sand Aronsson, F. (2025).
Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain : Perspectives for Patients and Clinicians.
Journal of Medical Internet Research, 27.
[14]
Mehta, S. (2025).
Probabilistic Speech & Motion Synthesis : Towards More Expressive and Multimodal Generative Models
(Doktorsavhandling , KTH Royal Institute of Technology, TRITA-EECS-AVL 2025:76). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-368342.
[15]
Mehta, S., Gamper, H., Jojic, N. (2025).
Make Some Noise : Towards LLM audio reasoning and generation using sound tokens.
I ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (s. 1-5). Institute of Electrical and Electronics Engineers (IEEE).
[16]
Best, P., Araya-Salas, M., Ekström, A. G., Freitas, B., Jensen, F. H., Kershenbaum, A. ... Marxer, R. (2025).
Bioacoustic fundamental frequency estimation : a cross-species dataset and deep learning baseline.
Bioacoustics, 34(4), 419-446.
[17]
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025).
The AI Music Arms Race : On the Detection of AI-Generated Music.
Transactions of the International Society for Music Information Retrieval, 8(1), 179-194.
[18]
Torubarova, E. (2025).
Brain-Focused Multimodal Approach for Studying Conversational Engagement in HRI.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1894-1896). Institute of Electrical and Electronics Engineers (IEEE).
[19]
Torubarova, E., Arvidsson, C., Berrebi, J., Uddén, J., Abelho Pereira, A. T. (2025).
NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 849-858). Institute of Electrical and Electronics Engineers (IEEE).
[20]
Tuttösí, P., Mehta, S., Syvenky, Z., Burkanova, B., Hfsafsti, M., Wang, Y., Yeung, H. H., Henter, G. E., Aucouturier, J. J., Lim, A. (2025).
Take a Look, it's in a Book, a Reading Robot.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1803-1805). Institute of Electrical and Electronics Engineers (IEEE).
[21]
Irfan, B., Churamani, N., Zhao, M., Ayub, A., Rossi, S. (2025).
Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI) : Overcoming Inequalities with Adaptation.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1970-1972). Institute of Electrical and Electronics Engineers (IEEE).
[22]
Skantze, G., Irfan, B. (2025).
Applying General Turn-Taking Models to Conversational Human-Robot Interaction.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 859-868). Institute of Electrical and Electronics Engineers (IEEE).
[23]
Reimann, M. M., Hindriks, K. V., Kunneman, F. A., Oertel, C., Skantze, G., Leite, I. (2025).
What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 708-716). Institute of Electrical and Electronics Engineers (IEEE).
[24]
Irfan, B., Skantze, G. (2025).
Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1357-1362). Institute of Electrical and Electronics Engineers (IEEE).
[25]
Janssens, R., Pereira, A., Skantze, G., Irfan, B., Belpaeme, T. (2025).
Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs.
I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE).
[26]
Cros Vila, L., Sturm, B. (2025).
(Mis)Communicating with our AI Systems.
I Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery (ACM).
[27]
Kamelabad, A. M., Inoue, E., Skantze, G. (2025).
Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning.
I Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 829-838).
[28]
Gonzalez Oliveras, P., Engwall, O. & Wilde, A. (2025).
Social Educational Robotics and Learning Analytics : A Scoping Review of an Emerging Field.
International Journal of Social Robotics.
[29]
Cai, H. & Ternström, S. (2025).
A WaveNet-based model for predicting the electroglottographic signal from the acoustic voice signal.
Journal of the Acoustical Society of America, 157(4), 3033-3044.
[30]
Marcinek, L., Beskow, J., Gustafsson, J. (2025).
A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI.
I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-297). Springer Nature.
[31]
Mishra, C., Skantze, G., Hagoort, P., Verdonschot, R. (2025).
Perception of Emotions in Human and Robot Faces : Is the Eye Region Enough?.
I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-303). Springer Nature.
[32]
Herbst, C. T., Tokuda, I. T., Nishimura, T., Ternström, S., Ossio, V., Levy, M. ... Dunn, J. C. (2025).
‘Monkey yodels’—frequency jumps in New World monkey vocalizations greatly surpass human vocal register transitions.
Philosophical Transactions of the Royal Society of London. Biological Sciences, 380(1923).
[33]
Irfan, B., Kuoppamäki, S., Hosseini, A. & Skantze, G. (2025).
Between reality and delusion : challenges of applying large language models to companion robots for open-domain dialogues with older adults.
Autonomous Robots, 49(1).
[34]
Borg, A., Georg, C., Jobs, B., Huss, V., Waldenlind, K., Ruiz, M. ... Parodis, I. (2025).
Virtual Patient Simulations Using Social Robotics Combined With Large Language Models for Clinical Reasoning Training in Medical Education: Mixed Methods Study.
Journal of Medical Internet Research, 27.
[35]
Cai, H. (2025).
Mapping voice quality in normal, pathological and synthetic voices
(Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:25). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360211.
[36]
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025).
Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music.
Journal of New Music Research, 1-17.
[37]
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025).
Classification of voice quality using neck-surface acceleration : Comparison with glottal flow and radiated sound.
Journal of Voice, 39(1), 10-24.
[38]
Székely, É., Hope, M. (2024).
An inclusive approach to creating a palette of synthetic voices for gender diversity.
I Proc. Interspeech 2024. (s. 3070-3074).
[39]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A critical survey of research in music genre recognition.
I Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024. (s. 745-782). International Society for Music Information Retrieval.
[40]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2024).
A Framework for Phoneme-Level Pronunciation Assessment Using CTC.
I Interspeech 2024. (s. 302-306). International Speech Communication Association.
[41]
Edlund, J., Tånnander, C., Le Maguer, S., Wagner, P. (2024).
Assessing the impact of contextual framing on subjective TTS quality.
I Interspeech 2024. (s. 1205-1209). International Speech Communication Association.
[42]
Székely, É., Hope, M. (2024).
An inclusive approach to creating a palette of synthetic voices for gender diversity.
I Interspeech 2024. (s. 3070-3074). International Speech Communication Association.
[43]
[44]
Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024).
A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams.
EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).
[45]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A Critical Survey of Research in Music Genre Recognition.
I Proc. International Society for Music Information Retrieval Conference. ISMIR.
[46]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024).
AI Music Studies : Preparing for the Coming Flood.
I Proceedings of AI Music Creativity..
[47]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024).
Applying textual inversion to control and personalize text-to-music models.
I Proc. 15th Int. Workshop on Machine Learning and Music..
[48]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024).
An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors.
I Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[49]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[50]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
I Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.