TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]

Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024). Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data. In 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (pp. 219-224). Association for Computational Linguistics (ACL).

[2]

Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024). MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (pp. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).

[3]

Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024). Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music. In Proceedings New Interfaces for Musical Expression NIME’24..

[4]

Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024). DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input. In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..

[5]

Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024). Prosodic characteristics of English-accented Swedish neural TTS. In Proceedings of Speech Prosody 2024. (pp. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.

[6]

Misra, S., Boye, J. (2024). Nested Noun Phrase Identification using BERT. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 12138-12143). European Language Resources Association (ELRA).

[7]

Malisz, Z., Foremski, J., Kul, M. (2024). PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 13068-13073). European Language Resources Association (ELRA).

[8]

Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024). Multilingual Turn-taking Prediction Using Voice Activity Projection. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 11873-11883). European Language Resources Association (ELRA).

[9]

Tånnander, C., Edlund, J., Gustafsson, J. (2024). Revisiting Three Text-to-Speech Synthesis Experiments with a Web-Based Audience Response System. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 14111-14121). European Language Resources Association (ELRA).

[10]

Wang, S., Székely, É. (2024). Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 6464-6474). European Language Resources Association (ELRA).

[11]

Lameris, H., Székely, É., Gustafsson, J. (2024). The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 16058-16065). European Language Resources Association (ELRA).

[12]

Irfan, B., Kuoppamäki, S. & Skantze, G. (2024). Recommendations for designing conversational companion robots with older adults through foundation models. Frontiers in Robotics and AI, 11.

[13]

Wennberg, U., Henter, G. E. (2024). Exploring Internal Numeracy in Language Models: A Case Study on ALBERT. In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).

[14]

Esfandiari-Baiat, G., Edlund, J. (2024). The MEET Corpus: Collocated, Distant and Hybrid Three-party Meetings with a Ranking Task. In ISA 2024: 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation at LREC-COLING 2024, Workshop Proceedings. (pp. 1-7). European Language Resources Association (ELRA).

[15]

Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024). Introducing the TISMIR Education Track: What, Why, How?. Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.

[16]

Casini, L., Jonason, N., Sturm, B. (2024). Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.

[17]

Dalmazzo, D., Deguernel, K., Sturm, B. (2024). The Chordinator : Modeling Music Harmony by Implementing Transformer Networks and Token Strategies. In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 52-66). Springer Nature.

[18]

Ekström, A. G. (2024). A Theory That Never Was: Wrong Way to the “Dawn of Speech”. Biolinguistics, 18.

[19]

Kaila, A.-K., Sturm, B. (2024). Agonistic Dialogue on the Value and Impact of AI Music Applications. In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.

[20]

Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024). Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. Journal of Speech, Language and Hearing Research, 1-22.

[21]

Ternström, S. (2024). Pragmatic De-Noising of Electroglottographic Signals. Bioengineering, 11(5), 479.

[22]

Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024). Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps. Journal of Voice.

[23]

Traum, D., Skantze, G., Nishizaki, H., Higashinaka, R., Minato, T. & Nagai, T. (2024). Special issue on multimodal processing and robotics for dialogue systems (Part II). Advanced Robotics, 38(4), 193-194.

[24]

Borg, A., Parodis, I., Skantze, G. (2024). Creating Virtual Patients using Robots and Large Language Models: A Preliminary Study with Medical Students. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).

[25]

Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024). Goes to the Heart: Speaking the User's Native Language. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).

[26]

Kamelabad, A. M. (2024). The Qestion Is Not Whether; It Is How!. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 112-114). Association for Computing Machinery (ACM).

[27]

Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).

[28]

Axelsson, A., Vaddadi, B., Bogdan, C. M., Skantze, G. (2024). Robots in autonomous buses: Who hosts when no human is there?. In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1278-1280). Association for Computing Machinery (ACM).

[29]

Wolfert, P., Henter, G. E. & Belpaeme, T. (2024). Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour. Applied Sciences, 14(4).

[30]

Mehta, S., Frisk, K. & Nyborg, L. (2024). Role of Cr in Mn-rich precipitates for Al–Mn–Cr–Zr-based alloys tailored for additive manufacturing. Calphad, 84.

[31]

Cumbal, R., Engwall, O. (2024). Speaking Transparently : Social Robots in Educational Settings. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11--14, 2024, Boulder, CO, USA..

[32]

Cumbal, R. (2024). Robots Beyond Borders : The Role of Social Robots in Spoken Second Language Practice (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2024:23). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-343863.

[33]

Ternström, S. (2024). Update 3.1 to FonaDyn : A system for real-time analysis of the electroglottogram, over the voice range. SoftwareX, 26.

[34]

Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024). Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics. Journal of the Acoustical Society of America, 155(1), 18-28.

[35]

Kalpakchi, D. & Boye, J. (2024). Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies. Natural Language Engineering, 217-255.

[36]

Rosenberg, S., Sundberg, J. & Lã, F. (2024). Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition. Journal of Voice, 38(3), 585-594.

[37]

Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024). CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice. Journal of Voice, 38(3), 549-560.

[38]

Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2024). Treatment of Hypophonia in Parkinson’s Disease Through Biofeedback in Daily Life Administered with A Portable Voice Accumulator. Journal of Voice, 38(3), 800.e27-800.e38.

[39]

Wolfert, P., Henter, G. E., Belpaeme, T. (2023). "Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion. In ICMI 2023 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction. (pp. 6-10). Association for Computing Machinery (ACM).

[40]

Axelsson, A. (2023). Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction (Doctoral thesis , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2023:70). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-338178.

[41]

Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2023). An Analysis of Goodness of Pronunciation for Child Speech. In Interspeech 2023. (pp. 4613-4617). International Speech Communication Association.

[42]

Tånnander, C., House, D., Edlund, J. (2023). Analysis-by-synthesis : phonetic-phonological variation indeep neural network-based text-to-speech synthesis. In Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023. (pp. 3156-3160). Prague, Czech Republic: GUARANT International.

[43]

Sturm, B., Flexer, A. (2023). A Review of Validity and its Relationship to Music Information Research. In Proc. Int. Symp. Music Information Retrieval..

[44]

Amerotti, M., Benford, S., Sturm, B., Vear, C. (2023). A Live Performance Rule System Informed by Irish Traditional Dance Music. In Proc. International Symposium on Computer Music Multidisciplinary Research..

[45]

Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).

[46]

Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023). A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces. In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).

[47]

Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G. E. & Neff, M. (2023). A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. Computer graphics forum (Print), 42(2), 569-596.

[48]

Pérez Zarazaga, P., Henter, G. E., Malisz, Z. (2023). A processing framework to access large quantities of whispered speech found in ASMR. In ICASSP 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes, Greece: IEEE Signal Processing Society.

[49]

Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. (Manuscript).

[50]

Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023). A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing, 537, 296-308.

Full list in the KTH publications portal

Studies

Research

Collaboration

About KTH

Library

TMH Publications (latest 50)

TMH Publications

Contact