Till innehåll på sidan
Till KTH:s startsida

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Bokkahalli Satish, S. H., Henter, G. E., Székely, É. (2026). When Voice Matters : Evidence of Gender Disparity in Positional Bias of SpeechLLMs. I Speech and Computer - 27th International Conference, SPECOM 2025, Proceedings. (s. 25-38). Springer Nature.
[2]
Amerotti, M., Benford, S., Sturm, B. L.T., Vear, C. (2026). A Live Performance Rule System Informed by Irish Traditional Dance Music. I Music and Sound Generation in the AI Era - 16th International Symposium, CMMR 2023, Revised Selected Papers. (s. 127-139). Springer Nature.
[3]
Vaddadi, B., Axelsson, A., Skantze, G. (2026). The Role of Social Robots in Autonomous Public Transport. I Transport Transitions: Advancing Sustainable and Inclusive Mobility: Proceedings of the 10th TRA Conference, 2024, Dublin, Ireland - Volume 1: Safe and Equitable Transport. (s. 711-716). Springer Nature.
[4]
Qian, L., Figueroa, C., Skantze, G. (2025). Representation of perceived prosodic similarity of conversational feedback. I Interspeech 2025. (s. 374-378). International Speech Communication Association.
[5]
Bokkahalli Satish, S. H., Henter, G. E., Székely, É. (2025). Hear Me Out : Interactive evaluation and bias discovery platform for speech-to-speech conversational AI. I Interspeech 2025. (s. 2151-2152). International Speech Communication Association.
[6]
Netzorg, R., Carvalho, N., Guzman, A., Wang, L., Francis, J., Garoute, K. V., Johnson, K., Anumanchipalli, G. K. (2025). On the Production and Perception of a Single Speaker's Gender. I Interspeech 2025. (s. 669-673). International Speech Communication Association.
[7]
Park, M., Ontakhrai, S., Kittimathaveenan, K., Alfredsson, J., Ternström, S. (2025). How to make closed-back headphones transparent for avocalist’s own direct sound. Presenterad vid AES 159th Convention 2025 October 23–25, Long Beach, CA, USA. (s. 8). Audio Engineering Society, Inc.
[8]
Tånnander, C., House, D., Beskow, J., Edlund, J. (2025). Intrasentential English in Swedish TTS : perceived English-accentedness. I Interspeech 2025. (s. 1638-1642). International Speech Communication Association.
[9]
Malisz, Z., Foremski, J., Kul, M. (2025). Contextual predictability effects on acoustic distinctiveness in read Polish speech. I Interspeech 2025. (s. 335-339). International Speech Communication Association.
[10]
Thulinsson, F., Söderlund, N., Rafiei, S., Schenkman, B., Djupsjöbacka, A., Andrén, B., Brunnström, K. (2025). Impact of Camera height and Field-of-View on distance judgement and gap selection in digital rear-view mirrors in vehicles. I IS and T International Symposium on Electronic Imaging Science and Technology. Society for Imaging Science & Technology.
[11]
Lodagala, V. S., Alkanhal, L., Izham, D., Mehta, S., Chowdhury, S., Makki, A., Hussein, H. S., Henter, G. E., Ali, A. (2025). SawtArabi : A Benchmark Corpus for Arabic TTS. Standard, Dialectal and Code-Switching. I Interspeech 2025. (s. 4793-4797). International Speech Communication Association.
[12]
Cros Vila, L. (2025). Perspectives on AI and Music : Representation, Detection, and Explanation in the Age of AI-Generated Music (Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, Sweden, TRITA-EECS-AVL 2025:104). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-372384.
[13]
Ekström, A. G., Gärdenfors, P., Snyder, W. D., Friedrichs, D., McCarthy, R. C., Tsapos, M. ... Moran, S. (2025). Correlates of Vocal Tract Evolution in Late Pliocene and Pleistocene Hominins. Human Nature, 36(1), 22-69.
[14]
Moëll, B. (2025). Evaluation of Artificial Intelligence in the Medical Domain : Speech, Language and Applications (Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:83). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-371738.
[16]
Moell, B. & Sand Aronsson, F. (2025). Automatic Evaluation of the Pataka Test Using Machine Learning and Audio Signal Processing. Acta Logopaedica, 2.
[17]
Ekström, A. G., Karakostis, F. A., Snyder, W. D. & Moran, S. (2025). Rethinking Hominin Air Sac Loss in Light of Phylogenetically Meaningful Evidence. Evolutionary anthropology (Print), 34(3).
[18]
Moell, B., Aronsson, F. S. & Akbar, S. (2025). Medical reasoning in LLMs : an in-depth analysis of DeepSeek R1. Frontiers in Artificial Intelligence, 8.
[19]
Leite, I., Ahlberg, W., Pereira, A., Sestini, A., Gisslen, L., Tollmar, K. (2025). A Call for Deeper Collaboration Between Robotics and Game Development. I Proceedings of the IEEE 2025 Conference on Games, CoG 2025. Institute of Electrical and Electronics Engineers (IEEE).
[20]
Walker, R. S., Fleischer, M., Sundberg, J., Bieber, M., Zabel, H. & Mürbe, D. (2025). Retrospective longitudinal analysis of spectral features reveals divergent vocal development patterns for treble and non-treble singers. Journal of the Acoustical Society of America, 158(3), 1989-1998.
[21]
Jacka, R., Peña, P. R., Leonard, S. J., Székely, É., Cowan, B. R. (2025). Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking. I CUI 2025 - Proceedings of the 2025 ACM Conference on Conversational User Interfaces. Association for Computing Machinery (ACM).
[22]
Friedrichs, D., Ekström, A. G., Nolan, F., Moran, S. & Rosen, S. (2025). Static spectral cues serve as perceptual anchors in vowel recognition across a broad range of fundamental frequencies. Journal of the Acoustical Society of America, 158(2), 1560-1572.
[24]
Ekström, A. G., Tennie, C., Moran, S. & Everett, C. (2025). The Phoneme as a Cognitive Tool. Topics in Cognitive Science.
[25]
Moëll, B. & Sand Aronsson, F. (2025). Journaling with large language models : a novel UX paradigm for AI-driven personal health management. Frontiers in Artificial Intelligence, 8.
[26]
Grouwels, J., Jonason, N., Sturm, B. (2025). Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings. I GECCO 2025 - Proceedings of the 2025 Genetic and Evolutionary Computation Conference. (s. 1362-1370). Association for Computing Machinery (ACM).
[27]
Cavalcanti, J. C., Skantze, G. (2025). "Dyadosyncrasy", Idiosyncrasy and Demographic Factors in Turn-Taking. I Proceedings of the Interspeech 2025. Rotterdam, The Netherlands: International Speech Communication Association.
[29]
Mehta, S. (2025). Probabilistic Speech & Motion Synthesis : Towards More Expressive and Multimodal Generative Models (Doktorsavhandling , KTH Royal Institute of Technology, TRITA-EECS-AVL 2025:76). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-368342.
[30]
Mehta, S., Gamper, H., Jojic, N. (2025). Make Some Noise : Towards LLM audio reasoning and generation using sound tokens. I ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (s. 1-5). Institute of Electrical and Electronics Engineers (IEEE).
[31]
Best, P., Araya-Salas, M., Ekström, A. G., Freitas, B., Jensen, F. H., Kershenbaum, A. ... Marxer, R. (2025). Bioacoustic fundamental frequency estimation : a cross-species dataset and deep learning baseline. Bioacoustics, 34(4), 419-446.
[32]
Cros Vila, L., Sturm, B., Casini, L. & Dalmazzo, D. (2025). The AI Music Arms Race : On the Detection of AI-Generated Music. Transactions of the International Society for Music Information Retrieval, 8(1), 179-194.
[33]
Torubarova, E. (2025). Brain-Focused Multimodal Approach for Studying Conversational Engagement in HRI. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1894-1896). Institute of Electrical and Electronics Engineers (IEEE).
[34]
Torubarova, E., Arvidsson, C., Berrebi, J., Uddén, J., Abelho Pereira, A. T. (2025). NeuroEngage: A Multimodal Dataset Integrating fMRI for Analyzing Conversational Engagement in Human-Human and Human-Robot Interactions. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 849-858). Institute of Electrical and Electronics Engineers (IEEE).
[35]
Tuttösí, P., Mehta, S., Syvenky, Z., Burkanova, B., Hfsafsti, M., Wang, Y., Yeung, H. H., Henter, G. E., Aucouturier, J. J., Lim, A. (2025). Take a Look, it's in a Book, a Reading Robot. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1803-1805). Institute of Electrical and Electronics Engineers (IEEE).
[36]
Irfan, B., Churamani, N., Zhao, M., Ayub, A., Rossi, S. (2025). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI) : Overcoming Inequalities with Adaptation. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1970-1972). Institute of Electrical and Electronics Engineers (IEEE).
[37]
Skantze, G., Irfan, B. (2025). Applying General Turn-Taking Models to Conversational Human-Robot Interaction. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 859-868). Institute of Electrical and Electronics Engineers (IEEE).
[38]
Irfan, B., Skantze, G. (2025). Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1357-1362). Institute of Electrical and Electronics Engineers (IEEE).
[39]
Janssens, R., Pereira, A., Skantze, G., Irfan, B., Belpaeme, T. (2025). Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs. I HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE).
[40]
Cros Vila, L., Sturm, B. (2025). (Mis)Communicating with our AI Systems. I Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery (ACM).
[41]
Kamelabad, A. M., Inoue, E., Skantze, G. (2025). Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning. I Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. (s. 829-838).
[42]
Gonzalez Oliveras, P., Engwall, O. & Wilde, A. (2025). Social Educational Robotics and Learning Analytics : A Scoping Review of an Emerging Field. International Journal of Social Robotics.
[43]
Cai, H. & Ternström, S. (2025). A WaveNet-based model for predicting the electroglottographic signal from the acoustic voice signal. Journal of the Acoustical Society of America, 157(4), 3033-3044.
[44]
Marcinek, L., Beskow, J., Gustafsson, J. (2025). A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI. I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-297). Springer Nature.
[45]
Mishra, C., Skantze, G., Hagoort, P., Verdonschot, R. (2025). Perception of Emotions in Human and Robot Faces : Is the Eye Region Enough?. I Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (s. 290-303). Springer Nature.
[46]
Herbst, C. T., Tokuda, I. T., Nishimura, T., Ternström, S., Ossio, V., Levy, M. ... Dunn, J. C. (2025). ‘Monkey yodels’—frequency jumps in New World monkey vocalizations greatly surpass human vocal register transitions. Philosophical Transactions of the Royal Society of London. Biological Sciences, 380(1923).
[47]
[48]
Cai, H. (2025). Mapping voice quality in normal, pathological and synthetic voices (Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:25). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360211.
[49]
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025). Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music. Journal of New Music Research, 1-17.
[50]
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025). Classification of voice quality using neck-surface acceleration : Comparison with glottal flow and radiated sound. Journal of Voice, 39(1), 10-24.
Fullständig lista i KTH:s publikationsportal