Skip to main content
To KTH's start page To KTH's start page

TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Wennberg, U., Henter, G. E. (2024). Exploring Internal Numeracy in Language Models: A Case Study on ALBERT. In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[2]
Esfandiari-Baiat, G., Edlund, J. (2024). The MEET Corpus: Collocated, Distant and Hybrid Three-party Meetings with a Ranking Task. In ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings. (pp. 1-7). European Language Resources Association (ELRA).
[3]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024). Introducing the TISMIR Education Track: What, Why, How?. Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[4]
Casini, L., Jonason, N., Sturm, B. (2024). Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[5]
Dalmazzo, D., Deguernel, K., Sturm, B. (2024). The Chordinator : Modeling Music Harmony by Implementing Transformer Networks and Token Strategies. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 52-66). Springer Nature.
[6]
Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.
[7]
Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[8]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024). Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. Journal of Speech, Language and Hearing Research, 1-22.
[9]
Ternström, S. (2024). Pragmatic De-Noising of Electroglottographic Signals. Bioengineering, 11(5), 479.
[10]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024). Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps. Journal of Voice.
[11]
Traum, D., Skantze, G., Nishizaki, H., Higashinaka, R., Minato, T. & Nagai, T. (2024). Special issue on multimodal processing and robotics for dialogue systems (Part II). Advanced Robotics, 38(4), 193-194.
[12]
Borg, A., Parodis, I., Skantze, G. (2024). Creating Virtual Patients using Robots and Large Language Models: A Preliminary Study with Medical Students. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[13]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024). Goes to the Heart: Speaking the User's Native Language. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[14]
Kamelabad, A. M. (2024). The Qestion Is Not Whether; It Is How!. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 112-114). Association for Computing Machinery (ACM).
[15]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).
[16]
Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024). Robots in autonomous buses: Who hosts when no human is there?. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1278-1280). Association for Computing Machinery (ACM).
[17]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024). Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour. Applied Sciences, 14(4).
[19]
Cumbal, R., Engwall, O. (2024). Speaking Transparently : Social Robots in Educational Settings. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11--14, 2024, Boulder, CO, USA..
[20]
Cumbal, R. (2024). Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.
[22]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024). Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics. Journal of the Acoustical Society of America, 155(1), 18-28.
[24]
Deichler, A., Mehta, S., Alexanderson, S., Beskow, J. (2023). Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation. In PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023. (pp. 755-762). Association for Computing Machinery (ACM).
[25]
Torre, I., Lagerstedt, E., Dennler, N., Seaborn, K., Leite, I., Székely, É. (2023). Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?. In 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN. (pp. 106-112). Institute of Electrical and Electronics Engineers (IEEE).
[26]
D'Amario, S., Ternström, S., Goebl, W. & Bishop, L. (2023). Body motion of choral singers. Frontiers in Psychology, 14.
[27]
Figueroa, C., Ochs, M., Skantze, G. (2023). Classification of Feedback Functions in Spoken Dialog Using Large Language Models and Prosodic Features. In 27th Workshop on the Semantics and Pragmatics of Dialogue. (pp. 15-24). Maribor: University of Maribor.
[28]
Wolfert, P., Henter, G. E., Belpaeme, T. (2023). "Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion. In ICMI 2023 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction. (pp. 6-10). Association for Computing Machinery (ACM).
[29]
Axelsson, A. (2023). Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2023:70). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-338178.
[30]
Feindt, K., Rossi, M., Esfandiari-Baiat, G., Ekström, A. G., Zellers, M. (2023). Cues to next-speaker projection in conversational Swedish: Evidence from reaction times. In Interspeech 2023. (pp. 1040-1044). International Speech Communication Association.
[31]
Ekstedt, E., Wang, S., Székely, É., Gustafsson, J., Skantze, G. (2023). Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis. In Interspeech 2023. (pp. 5481-5485). International Speech Communication Association.
[32]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2023). An Analysis of Goodness of Pronunciation for Child Speech. In Interspeech 2023. (pp. 4613-4617). International Speech Communication Association.
[33]
Fallgren, P., Edlund, J. (2023). Crowdsource-based validation of the audio cocktail as a sound browsing tool. In Interspeech 2023. (pp. 2178-2182). International Speech Communication Association.
[34]
Lameris, H., Gustafsson, J., Székely, É. (2023). Beyond style : synthesizing speech with pragmatic functions. In Interspeech 2023. (pp. 3382-3386). International Speech Communication Association.
[36]
Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T. ... Ylinen, S. (2023). Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children. IEEE Access, 11, 86025-86037.
[37]
Tånnander, C., House, D., Edlund, J. (2023). Analysis-by-synthesis : phonetic-phonological variation indeep neural network-based text-to-speech synthesis. In Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023. (pp. 3156-3160). Prague, Czech Republic: GUARANT International.
[38]
Sturm, B., Flexer, A. (2023). A Review of Validity and its Relationship to Music Information Research. In Proc. Int. Symp. Music Information Retrieval..
[39]
Amerotti, M., Benford, S., Sturm, B., Vear, C. (2023). A Live Performance Rule System Informed by Irish Traditional Dance Music. In Proc. International Symposium on Computer Music Multidisciplinary Research..
[40]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).
[41]
Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023). A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces. In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).
[42]
Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G. E. & Neff, M. (2023). A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. Computer graphics forum (Print), 42(2), 569-596.
[43]
Leijon, A., von Gablenz, P., Holube, I., Taghia, J. & Smeds, K. (2023). Bayesian analysis of Ecological Momentary Assessment (EMA) data collected in adults before and after hearing rehabilitation. Frontiers in Digital Health, 5.
[44]
Pérez Zarazaga, P., Henter, G. E., Malisz, Z. (2023). A processing framework to access large quantities of whispered speech found in ASMR. In ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes, Greece: IEEE Signal Processing Society.
[45]
Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. (Manuscript).
[46]
Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023). A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing, 537, 296-308.
[47]
Stenwig, E., Salvi, G., Rossi, P. S. & Skjaervold, N. K. (2023). Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit. BMC Medical Research Methodology, 23(1).
[48]
Falk, S., Sturm, B., Ahlbäck, S. (2023). Automatic legato transcription based on onset detection. In SMC 2023: Proceedings of the Sound and Music Computing Conference 2023. (pp. 214-221). Sound and Music Computing Network.
[49]
Déguernel, K., Sturm, B. (2023). Bias in Favour or Against Computational Creativity : A Survey and Reflection on the Importance of Socio-cultural Context in its Evaluation. In Proc. International Conference on Computational Creativity..
[50]
Huang, R., Holzapfel, A., Sturm, B. & Kaila, A.-K. (2023). Beyond Diverse Datasets : Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music. Transactions of the International Society for Music Information Retrieval, 6(1), 43-59.
Full list in the KTH publications portal