Skip to main content

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

Göreke, H. D., Djupsjöbacka, A., Schenkman, B., Andrén, B., Hermann, D. S., Brunnström, K. (2023). Perceptual Judgments of Simulated Low Temperatures in LCD based Vehicle Displays. Presented at SID International Symposium Digest of Technical Papers, 2023, Los Angeles, United States of America, May 21 2023 - May 26 2023. (pp. 595-598). Wiley.
Yoon, Y., Kucherenko, T., Woo, J., Wolfert, P., Nagy, R., Henter, G. E. (2023). GENEA Workshop 2023 : The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. In ICMI 2023: Proceedings of the 25th International Conference on Multimodal Interaction. (pp. 822-823). Association for Computing Machinery (ACM).
Wolfert, P., Henter, G. E., Belpaeme, T. (2023). "Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion. In ICMI 2023 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction. (pp. 6-10). Association for Computing Machinery (ACM).
Mehta, S., Kirkland, A., Lameris, H., Beskow, J., Székely, É., Henter, G. E. (2023). OverFlow : Putting flows on top of neural transducers for better TTS. In Interspeech 2023. (pp. 4279-4283). International Speech Communication Association.
Willemsen, B., Qian, L., Skantze, G. (2023). Resolving References in Visually-Grounded Dialogue via Text Generation. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue. (pp. 457-469). Prague, Czechia: Association for Computational Linguistics (ACL).
Ekström, A. G. (2023). Predicting linguistic universality through reverse engineering. Nature Reviews Psychology, 2(10), 587.
Axelsson, A. (2023). Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2023:70). Retrieved from
Feindt, K., Rossi, M., Esfandiari-Baiat, G., Ekström, A. G., Zellers, M. (2023). Cues to next-speaker projection in conversational Swedish: Evidence from reaction times. In Interspeech 2023. (pp. 1040-1044). International Speech Communication Association.
Ekstedt, E., Wang, S., Székely, É., Gustafsson, J., Skantze, G. (2023). Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis. In Interspeech 2023. (pp. 5481-5485). International Speech Communication Association.
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2023). An Analysis of Goodness of Pronunciation for Child Speech. In Interspeech 2023. (pp. 4613-4617). International Speech Communication Association.
Rugayan, J., Salvi, G., Svendsen, T. (2023). Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation. In Interspeech 2023. (pp. 2158-2162). International Speech Communication Association.
Pandey, A., Edlund, J., Le Maguer, S., Harte, N. (2023). Listener sensitivity to deviating obstruents in WaveNet. In Interspeech 2023. (pp. 1080-1084). International Speech Communication Association.
Fallgren, P., Edlund, J. (2023). Crowdsource-based validation of the audio cocktail as a sound browsing tool. In Interspeech 2023. (pp. 2178-2182). International Speech Communication Association.
Lameris, H., Gustafsson, J., Székely, É. (2023). Beyond style : synthesizing speech with pragmatic functions. In Interspeech 2023. (pp. 3382-3386). International Speech Communication Association.
Székely, É., Gustafsson, J., Torre, I. (2023). Prosody-controllable gender-ambiguous speech synthesis : a tool for investigating implicit bias in speech perception. In Interspeech 2023. (pp. 1234-1238). International Speech Communication Association.
Kirkland, A., Gustafsson, J., Székely, É. (2023). Pardon my disfluency : The impact of disfluency effects on the perception of speaker competence and confidence. In Interspeech 2023. (pp. 5217-5221). International Speech Communication Association.
Kittimathaveenan, K., Ternström, S. (2023). Localisation in virtual choirs : outcomes of simplified binaural rendering. Presented at Audio Engineering Society Conference: AES 2023 International Conference on Spatial and Immersive Audio, Huddersfield, UK, 23-25 Aug 2023..
Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T. ... Ylinen, S. (2023). Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children. IEEE Access, 11, 86025-86037.
Tånnander, C., House, D., Edlund, J. (2023). Analysis-by-synthesis : phonetic-phonological variation indeep neural network-based text-to-speech synthesis. In Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023. (pp. 3156-3160). Prague, Czech Republic: GUARANT International.
Sturm, B., Flexer, A. (2023). A Review of Validity and its Relationship to Music Information Research. In Proc. Int. Symp. Music Information Retrieval..
Amerotti, M., Benford, S., Sturm, B., Vear, C. (2023). A Live Performance Rule System Informed by Irish Traditional Dance Music. In Proc. International Symposium on Computer Music Multidisciplinary Research..
Alexanderson, S., Nagy, R., Beskow, J. & Henter, G. E. (2023). Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models. ACM Transactions on Graphics, 42(4).
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).
Sundberg, J., La, F. & Granqvist, S. (2023). Fundamental frequency disturbances in female and male singers' pitch glides through long tube with varied resistancesa. Journal of the Acoustical Society of America, 154(2), 801-807.
Irfan, B., Ramachandran, A., Staffa, M., Gunes, H. (2023). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI) : Adaptivity for All. In HRI 2023: Companion of the ACM/IEEE International Conference on Human-Robot Interaction. (pp. 929-931). Association for Computing Machinery (ACM).
McMillan, D., Jaber, R., Cowan, B. R., Fischer, J. E., Irfan, B., Cumbal, R., Zargham, N., Lee, M. (2023). Human-Robot Conversational Interaction (HRCI). In HRI 2023: Companion of the ACM/IEEE International Conference on Human-Robot Interaction. (pp. 923-925). Association for Computing Machinery (ACM).
Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023). A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces. In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).
Gustafsson, J., Székely, É., Beskow, J. (2023). Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters. In 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023). Institute of Electrical and Electronics Engineers (IEEE).
Axelsson, A., Skantze, G. (2023). Do you follow? : A fully automated system for adaptive robot presenters. In HRI 2023: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 102-111). Association for Computing Machinery (ACM).
Mishra, C., Offrede, T., Fuchs, S., Mooshammer, C. & Skantze, G. (2023). Does a robot's gaze aversion affect human gaze aversion?. Frontiers in Robotics and AI, 10.
Borin, L., Domeij, R., Edlund, J. & Forsberg, M. (2023). Language Report Swedish. In Cognitive Technologies (pp. 219-222). Springer Nature.
Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G. E. & Neff, M. (2023). A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. Computer graphics forum (Print), 42(2), 569-596.
Pérez Zarazaga, P., Malisz, Z. (2023). Recovering implicit pitch contours from formants in whispered speech. Presented at 20th International Congress of Phonetic Sciences ICPhS 2023,7-11 August, 2023, Prague, Czech Republic.
Ekström, A. G. & Edlund, J. (2023). Evolution of the human tongue and emergence of speech biomechanics. Frontiers in Psychology, 14.
Leijon, A., von Gablenz, P., Holube, I., Taghia, J. & Smeds, K. (2023). Bayesian analysis of Ecological Momentary Assessment (EMA) data collected in adults before and after hearing rehabilitation. Frontiers in Digital Health, 5.
Pérez Zarazaga, P., Henter, G. E., Malisz, Z. (2023). A processing framework to access large quantities of whispered speech found in ASMR. In ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes, Greece: IEEE Signal Processing Society.
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. (Manuscript).
Kalpakchi, D., Boye, J. (2023). Quasi : a synthetic Question-Answering dataset in Swedish using GPT-3 and zero-shot learning. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). (pp. 477-491).
Lameris, H., Mehta, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). Prosody-Controllable Spontaneous TTS with Neural HMMs. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Institute of Electrical and Electronics Engineers (IEEE).
Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023). A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing, 537, 296-308.
Stenwig, E., Salvi, G., Rossi, P. S. & Skjaervold, N. K. (2023). Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit. BMC Medical Research Methodology, 23(1).
Falk, S., Sturm, B., Ahlbäck, S. (2023). Automatic legato transcription based on onset detection. In SMC 2023: Proceedings of the Sound and Music Computing Conference 2023. (pp. 214-221). Sound and Music Computing Network.
Déguernel, K., Sturm, B. (2023). Bias in Favour or Against Computational Creativity : A Survey and Reflection on the Importance of Socio-cultural Context in its Evaluation. In Proc. International Conference on Computational Creativity..
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2023). Learning to generate pointing gestures in situated embodied conversational agents. Frontiers in Robotics and AI, 10.
Huang, R., Holzapfel, A., Sturm, B. & Kaila, A.-K. (2023). Beyond Diverse Datasets : Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music. Transactions of the International Society for Music Information Retrieval, 6(1), 43-59.
Clemente, A., Friberg, A. & Holzapfel, A. (2023). Relations between perceived affect and liking for melodies and visual designs. Emotion, 23(6), 1584-1605.
Kamelabad, A. M., Skantze, G. (2023). I Learn Better Alone! Collaborative and Individual Word Learning With a Child and Adult Robot. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 368-377). New York, NY, United States: Association for Computing Machinery (ACM).
Full list in the KTH publications portal
Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing (TMH)
Last changed: Aug 15, 2023