Skip to main content

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2021). Analyzing Emotion Expression in Singing via Flow Glottograms, Long-Term-Average Spectra, and Expert Listener Evaluation. Journal of Voice, 35(1), 52-60.
[2]
Fornhammar, L., Sundberg, J., Fuchs, M. & Pieper, L. (2022). Measuring Voice Effects of Vibrato-Free and Ingressive Singing : A Study of Phonation Threshold Pressures. Journal of Voice, 36(4), 479-486.
[3]
Nylén, H., Chatterjee, S., Ternström, S. (2021). Detecting Signal Corruptions in Voice Recordings For Speech Therapy. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 386-390). Institute of Electrical and Electronics Engineers (IEEE).
[4]
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P., Henter, G. E. (2021). A large, crowdsourced evaluation of gesture generation systems on common data : The GENEA Challenge 2020. In Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces. (pp. 11-21). Association for Computing Machinery (ACM).
[6]
Kontogiorgos, D., Tran, M., Gustafsson, J., Soleymani, M. (2021). A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational Failures. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction. (pp. 112-120). Association for Computing Machinery (ACM).
[7]
Huang, R., Sturm, B. L.T., Holzapfel, A. (2021). De-centering the west : East asian philosophies and the ethics of applying artificial intelligence to music. Presented at International Society for Music Information Retrieval Conference, ISMIR.
[8]
Kalpakchi, D., Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. In Proceedings of the 14th International Conference on Natural Language Generation. (pp. 387-403).
[9]
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H., Bernardet, U. (2021). A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents. Presented at 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)..
[10]
Sturm, B. & Maruri-Aguilar, H. (2022). The Ai Music Generation Challenge 2020 : Double Jigs in the Style of O’Neill’s “1001”. Journal of Creative Music Systems.
[11]
Axelsson, A. & Skantze, G. (2022). Multimodal User Feedback During Adaptive Robot-Human Presentations. Frontiers in Computer Science, 3.
[12]
Engwall, O., Águas Lopes, J. D. & Cumbal, R. (2022). Is a Wizard-of-Oz Required for Robot-Led Conversation Practice in a Second Language?. International Journal of Social Robotics.
[13]
Weldon, C. F., Gillet, S., Cumbal, R., Leite, I. (2021). Exploring non-verbal gaze behavior in groups mediated by an adaptive robot. In ACM/IEEE International Conference on Human-Robot Interaction. (pp. 357-361). IEEE Computer Society.
[14]
Skantze, G. (2021). Conversational interaction with social robots. In ACM/IEEE International Conference on Human-Robot Interaction. IEEE Computer Society.
[15]
Shahrebabaki, A. S., Salvi, G., Svendsen, T. & Siniscalchi, S. M. (2022). Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models. IEEE/ACM transactions on audio, speech, and language processing, 30, 135-147.
[16]
Havel, M., Sundberg, J., Traser, L., Burdumy, M. & Echternach, M. (2021). Effects of Nasalization on Vocal Tract Response Curve. Journal of Voice.
[17]
Kontogiorgos, D. (2022). Mutual Understanding in Situated Interactions with Conversational User Interfaces : Theory, Studies, and Computation (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2022-10). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-308927.
[18]
Fallgren, P. (2022). Found speech and humans in the loop : Ways to gain insight into large quantities of speech (Doctoral thesis , KTH Royal Institute of Technology, TRITA-EECS-AVL 2022:13). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-309031.
[19]
De Gooijer, J. G., Henter, G. E. & Yuan, A. (2022). Kernel-based hidden Markov conditional densities. Computational Statistics & Data Analysis, 169.
[21]
Ahlberg, S., Axelsson, A., Yu, P., Shaw Cortez, W. E., Gao, Y., Ghadirzadeh, A. ... Dimarogonas, D. V. (2022). Co-adaptive Human-Robot Cooperation : Summary and Challenges. Unmanned Systems, 10(02), 187-203.
[22]
Jers, H. & Ternström, S. (2022). Vocal Ensembles : Chapter 20. In Gary E. McPherson (Ed.), The Oxford Handbook of Music Performance, Volume 2 ( (1 ed.) pp. 398-417). Oxford University Press.
[23]
Axelsson, A., Buschmeier, H. & Skantze, G. (2022). Modeling Feedback in Interaction With Conversational Agents—A Review. Frontiers in Computer Science, 4.
[24]
Blomsma, P., Skantze, G. & Swerts, M. (2022). Backchannel Behavior Influences the Perceived Personality of Human and Artificial Communication Partners. Frontiers in Artificial Intelligence, 5.
[26]
Engwall, O., Cumbal, R., Lopes, J., Ljung, M. & Månsson, L. (2022). Identification of Low-engaged Learners in Robot-led Second Language Conversations with Adults. ACM Transactions on Human-Robot Interaction, 11(2).
[27]
Edlund, J., Brodén, D., Fridlund, M., Lindhé, C., Olsson, L. -., Ängsal, M., Öhberg, P. (2022). A Multimodal Digital Humanities Study of Terrorism in Swedish Politics : An Interdisciplinary Mixed Methods Project on the Configuration of Terrorism in Parliamentary Debates, Legislation, and Policy Networks 1968–2018. In Lecture Notes in Networks and Systems. (pp. 435-449). Springer Nature.
[28]
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H., Bernardet, U. (2021). A framework for integrating gesture generation models into interactive conversational agents. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS. (pp. 1767-1769). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).
[29]
Mehta, S., Székely, É., Beskow, J., Henter, G. E. (2022). Neural HMMs are all you need (for high-quality attention-free TTS). In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 7457-7461). IEEE Signal Processing Society.
[30]
Kucherenko, T., Nagy, R., Neff, M., Kjellström, H., Henter, G. E. (2022). Multimodal analysis of the predictability of hand-gesture properties. In AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. (pp. 770-779). ACM Press.
[31]
Casini, L., Sturm, B. (2022). Tradformer : A Transformer Model of Traditional Music Transcriptions. Presented at International Joint Conference on Artificial Intelligence, Vienna, Austria 2022.
[32]
Sorkhei, M. M., Henter, G. E., Kjellström, H. (2021). Full-Glow : Fully conditional Glow for more realistic image generation. In Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021. (pp. 697-711). Cham, Switzerland: Springer Nature.
[33]
Ásgrímsson, D. S., González, I., Salvi, G. & Karoumi, R. (2022). Bayesian Deep Learning for Vibration-Based Bridge Damage Detection. In Structural Integrity (pp. 27-43). Springer Nature.
[34]
Lameris, H., Mehta, S., Henter, G. E., Kirkland, A., Moëll, B., O'Regan, J., Gustafsson, J., Székely, É. (2022). Spontaneous Neural HMM TTS with Prosodic Feature Modification. In Proceedings of Fonetik 2022..
[35]
Kucherenko, T., Jonell, P., Yoon, Y., Wolfert, P., Yumak, Z., Henter, G. E. (2021). GENEA Workshop 2021 : The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. In Proceedings of ICMI '21: International Conference on Multimodal Interaction. (pp. 872-873). Association for Computing Machinery (ACM).
[36]
Cumbal, R. (2022). Adaptive Robot Discourse for Language Acquisition in Adulthood. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1158-1160).
[37]
Beck, G., Wennberg, U., Malisz, Z., Henter, G. E. (2022). Wavebender GAN : An architecture for phonetically meaningful speech manipulation. Presented at IEEE ICASSP 2022,IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE conference proceedings.
[38]
Ward, N., Kirkland, A., Wlodarczak, M., Székely, É. (2022). Two Pragmatic Functions of Breathy Voice in American English Conversation. In Proceedings 11th International Conference on Speech Prosody. (pp. 82-86). International Speech Communication Association.
[39]
Beskow, J., Caper, C., Ehrenfors, J., Hagberg, N., Jansen, A., Wood, C. (2021). Expressive robot performance based on facial motion capture. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 2165-2166). International Speech Communication Association.
[40]
Deichler, A., Wang, S., Alexanderson, S., Beskow, J. (2022). Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation. Presented at Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction. (p. 2022).
[41]
Huang, R. S., Holzapfel, A. & Sturm, B. (2022). Global Ethics : From Philosophy to Practice A Culturally Informed Ethics of Music AI in Asia. In Martin Clancy (Ed.), Artificial Intelligence and Music Ecosystem. Routledge.
[42]
Moell, B., O'Regan, J., Mehta, S., Kirkland, A., Lameris, H., Gustafsson, J., Beskow, J. (2022). Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge. In The RaPID4 Workshop: Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments. (pp. 62-70). Marseille, France.
[43]
O'Regan, J. (2022). Continued finetuning as single speaker adaptation. In TMH QPSR. Stockholm.
[44]
Tånnander, C., Edlund, J. (2022). Mapping specific characteristics of spoken text to listener ratings. In Proceedings of Fonetik 2022. Stockholm, Sweden.
[45]
Tånnander, C., Edlund, J. (2022). Sardin : speech-oriented text processing. In Proceedings of Fonetik 2022. Stockholm, Sweden.
[46]
Tånnander, C., House, D., Edlund, J. (2022). Syllable duration as a proxy to latent prosodic features. In Proceedings of Speech Prosody 2022. (pp. 220-224). Lisbon, Portugal: International Speech Communication Association.
[47]
Skantze, G. & Willemsen, B. (2022). CoLLIE : Continual Learning of Language Grounding from Language-Image Embeddings. The journal of artificial intelligence research, 74, 1201-1223.
[48]
[49]
Elgarf, M., Zojaji, S., Skantze, G., Peters, C. (2022). CreativeBot : a Creative Storyteller robot to stimulate creativity in children. Presented at The 24th ACM International Conference on Multimodal Interaction (ICMI 2022) will be held in Bengaluru (Bangalore), India.(7-11 Nov 2022).
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2022). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice.
Full list in the KTH publications portal
Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing
Last changed: Oct 17, 2018