Skip to main content

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

Sundberg, J., Salomão, G. L. & Scherer, K. R. (2021). Analyzing Emotion Expression in Singing via Flow Glottograms, Long-Term-Average Spectra, and Expert Listener Evaluation. Journal of Voice, 35(1), 52-60.
Fornhammar, L., Sundberg, J., Fuchs, M. & Pieper, L. (2022). Measuring Voice Effects of Vibrato-Free and Ingressive Singing : A Study of Phonation Threshold Pressures. Journal of Voice, 36(4), 479-486.
Nylén, H., Chatterjee, S., Ternström, S. (2021). Detecting Signal Corruptions in Voice Recordings For Speech Therapy. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 386-390). Institute of Electrical and Electronics Engineers (IEEE).
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P., Henter, G. E. (2021). A large, crowdsourced evaluation of gesture generation systems on common data : The GENEA Challenge 2020. In Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces. (pp. 11-21). Association for Computing Machinery (ACM).
Kontogiorgos, D., Tran, M., Gustafsson, J., Soleymani, M. (2021). A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational Failures. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction. (pp. 112-120). Association for Computing Machinery (ACM).
Huang, R., Sturm, B. L.T., Holzapfel, A. (2021). De-centering the west : East asian philosophies and the ethics of applying artificial intelligence to music. Presented at International Society for Music Information Retrieval Conference, ISMIR.
Kalpakchi, D., Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. In Proceedings of the 14th International Conference on Natural Language Generation. (pp. 387-403).
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H., Bernardet, U. (2021). A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents. Presented at 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)..
Sturm, B. & Maruri-Aguilar, H. (2022). The Ai Music Generation Challenge 2020 : Double Jigs in the Style of O’Neill’s “1001”. Journal of Creative Music Systems.
Axelsson, A. & Skantze, G. (2022). Multimodal User Feedback During Adaptive Robot-Human Presentations. Frontiers in Computer Science, 3.
Engwall, O., Águas Lopes, J. D. & Cumbal, R. (2022). Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?. International Journal of Social Robotics.
Weldon, C. F., Gillet, S., Cumbal, R., Leite, I. (2021). Exploring non-verbal gaze behavior in groups mediated by an adaptive robot. In ACM/IEEE International Conference on Human-Robot Interaction. (pp. 357-361). IEEE Computer Society.
Skantze, G. (2021). Conversational interaction with social robots. In ACM/IEEE International Conference on Human-Robot Interaction. IEEE Computer Society.
Shahrebabaki, A. S., Salvi, G., Svendsen, T. & Siniscalchi, S. M. (2022). Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models. IEEE/ACM transactions on audio, speech, and language processing, 30, 135-147.
Havel, M., Sundberg, J., Traser, L., Burdumy, M. & Echternach, M. (2021). Effects of Nasalization on Vocal Tract Response Curve. Journal of Voice.
Kontogiorgos, D. (2022). Mutual Understanding in Situated Interactions with Conversational User Interfaces : Theory, Studies, and Computation (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2022-10). Retrieved from
Fallgren, P. (2022). Found speech and humans in the loop : Ways to gain insight into large quantities of speech (Doctoral thesis , KTH Royal Institute of Technology, TRITA-EECS-AVL 2022:13). Retrieved from
De Gooijer, J. G., Henter, G. E. & Yuan, A. (2022). Kernel-based hidden Markov conditional densities. Computational Statistics & Data Analysis, 169.
Ahlberg, S., Axelsson, A., Yu, P., Shaw Cortez, W. E., Gao, Y., Ghadirzadeh, A. ... Dimarogonas, D. V. (2022). Co-adaptive Human-Robot Cooperation : Summary and Challenges. Unmanned Systems, 10(02), 187-203.
Jers, H. & Ternström, S. (2022). Vocal Ensembles : Chapter 20. In Gary E. McPherson (Ed.), The Oxford Handbook of Music Performance, Volume 2 ( (1 ed.) pp. 398-417). Oxford University Press.
Axelsson, A., Buschmeier, H. & Skantze, G. (2022). Modeling Feedback in Interaction With Conversational Agents—A Review. Frontiers in Computer Science, 4.
Blomsma, P., Skantze, G. & Swerts, M. (2022). Backchannel Behavior Influences the Perceived Personality of Human and Artificial Communication Partners. Frontiers in Artificial Intelligence, 5.
Engwall, O., Cumbal, R., Lopes, J., Ljung, M. & Månsson, L. (2022). Identification of Low-engaged Learners in Robot-led Second Language Conversations with Adults. ACM Transactions on Human-Robot Interaction, 11(2).
Edlund, J., Brodén, D., Fridlund, M., Lindhé, C., Olsson, L. -., Ängsal, M., Öhberg, P. (2022). A Multimodal Digital Humanities Study of Terrorism in Swedish Politics : An Interdisciplinary Mixed Methods Project on the Configuration of Terrorism in Parliamentary Debates, Legislation, and Policy Networks 1968–2018. In Lecture Notes in Networks and Systems. (pp. 435-449). Springer Nature.
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H., Bernardet, U. (2021). A framework for integrating gesture generation models into interactive conversational agents. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS. (pp. 1767-1769). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).
Mehta, S., Székely, É., Beskow, J., Henter, G. E. (2022). Neural HMMs are all you need (for high-quality attention-free TTS). In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 7457-7461). IEEE Signal Processing Society.
Kucherenko, T., Nagy, R., Neff, M., Kjellström, H., Henter, G. E. (2022). Multimodal analysis of the predictability of hand-gesture properties. In AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. (pp. 770-779). ACM Press.
Casini, L., Sturm, B. (2022). Tradformer : A Transformer Model of Traditional Music Transcriptions. Presented at International Joint Conference on Artificial Intelligence, Vienna, Austria 2022.
Sorkhei, M. M., Henter, G. E., Kjellström, H. (2021). Full-Glow : Fully conditional Glow for more realistic image generation. In Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021. (pp. 697-711). Cham, Switzerland: Springer Nature.
Ásgrímsson, D. S., González, I., Salvi, G. & Karoumi, R. (2022). Bayesian Deep Learning for Vibration-Based Bridge Damage Detection. In Structural Integrity (pp. 27-43). Springer Nature.
Lameris, H., Mehta, S., Henter, G. E., Kirkland, A., Moëll, B., O'Regan, J., Gustafsson, J., Székely, É. (2022). Spontaneous Neural HMM TTS with Prosodic Feature Modification. In Proceedings of Fonetik 2022..
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P., Yumak, Z., Henter, G. E. (2021). GENEA Workshop 2021 : The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. In Proceedings of ICMI '21: International Conference on Multimodal Interaction. (pp. 872-873). Association for Computing Machinery (ACM).
Cumbal, R. (2022). Adaptive Robot Discourse for Language Acquisition in Adulthood. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1158-1160).
Beck, G., Wennberg, U., Malisz, Z., Henter, G. E. (2022). Wavebender GAN : An architecture for phonetically meaningful speech manipulation. Presented at IEEE ICASSP 2022,IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE conference proceedings.
Ward, N., Kirkland, A., Wlodarczak, M., Székely, É. (2022). Two Pragmatic Functions of Breathy Voice in American English Conversation. In Proceedings 11th International Conference on Speech Prosody. (pp. 82-86). International Speech Communication Association.
Beskow, J., Caper, C., Ehrenfors, J., Hagberg, N., Jansen, A., Wood, C. (2021). Expressive robot performance based on facial motion capture. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 2165-2166). International Speech Communication Association.
Deichler, A., Wang, S., Alexanderson, S., Beskow, J. (2022). Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation. Presented at Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction. (p. 2022).
Huang, R. S., Holzapfel, A. & Sturm, B. (2022). Global Ethics : From Philosophy to Practice A Culturally Informed Ethics of Music AI in Asia. In Martin Clancy (Ed.), Artificial Intelligence and Music Ecosystem. Routledge.
Moell, B., O'Regan, J., Mehta, S., Kirkland, A., Lameris, H., Gustafsson, J., Beskow, J. (2022). Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge. In The RaPID4 Workshop: Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments. (pp. 62-70). Marseille, France.
O'Regan, J. (2022). Continued finetuning as single speaker adaptation. In TMH QPSR. Stockholm.
Tånnander, C., Edlund, J. (2022). Mapping specific characteristics of spoken text to listener ratings. In Proceedings of Fonetik 2022. Stockholm, Sweden.
Tånnander, C., Edlund, J. (2022). Sardin : speech-oriented text processing. In Proceedings of Fonetik 2022. Stockholm, Sweden.
Tånnander, C., House, D., Edlund, J. (2022). Syllable duration as a proxy to latent prosodic features. In Proceedings of Speech Prosody 2022. (pp. 220-224). Lisbon, Portugal: International Speech Communication Association.
Skantze, G. & Willemsen, B. (2022). CoLLIE : Continual Learning of Language Grounding from Language-Image Embeddings. The journal of artificial intelligence research, 74, 1201-1223.
Elgarf, M., Zojaji, S., Skantze, G., Peters, C. (2022). CreativeBot : a Creative Storyteller robot to stimulate creativity in children. Presented at The 24th ACM International Conference on Multimodal Interaction (ICMI 2022) will be held in Bengaluru (Bangalore), India.(7-11 Nov 2022).
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2022). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice.
Full list in the KTH publications portal
Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing
Last changed: Oct 17, 2018