TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025).
Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music.
Journal of New Music Research, 1-17.
[2]
Sturm, B., Kanhov, E., Holzapfel, A. (Eds.). (2024).
Collected Materials of The First International Conference in AI Music Studies : Prospects, Challenges and Methodologies of Studying AI Music in the Humanities and Social Sciences
. KTH Royal Institute of Technology.
[3]
Deichler, A., Alexanderson, S., Beskow, J. (2024).
Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents.
In Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents, IVA 2024. Association for Computing Machinery (ACM).
[4]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2024).
A Framework for Phoneme-Level Pronunciation Assessment Using CTC.
In Interspeech 2024. (pp. 302-306). International Speech Communication Association.
[5]
Blomsma, P., Vaitonyté, J., Skantze, G. & Swerts, M. (2024).
Backchannel behavior is idiosyncratic.
Language and Cognition, 16(4), 1158-1181.
[6]
Edlund, J., Tånnander, C., Le Maguer, S., Wagner, P. (2024).
Assessing the impact of contextual framing on subjective TTS quality.
In Interspeech 2024. (pp. 1205-1209). International Speech Communication Association.
[7]
Qian, L., Skantze, G. (2024).
Joint Learning of Context and Feedback Embeddings in Spoken Dialogue.
In Interspeech 2024. (pp. 2955-2959). International Speech Communication Association.
[8]
Székely, É., Hope, M. (2024).
An inclusive approach to creating a palette of synthetic voices for gender diversity.
In Interspeech 2024. (pp. 3070-3074). International Speech Communication Association.
[9]
Tånnander, C., Mehta, S., Beskow, J., Edlund, J. (2024).
Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis.
In Interspeech 2024. (pp. 2815-2819). International Speech Communication Association.
[10]
Wang, S., Székely, É., Gustafsson, J. (2024).
Contextual Interactive Evaluation of TTS Models in Dialogue Systems.
In Interspeech 2024. (pp. 2965-2969). International Speech Communication Association.
[11]
Lameris, H., Gustafsson, J., Székely, É. (2024).
CreakVC : A Voice Conversion Tool for Modulating Creaky Voice.
In Interspeech 2024. (pp. 1005-1006). International Speech Communication Association.
[12]
Francis, J., Székely, É., Gustafsson, J. (2024).
ConnecTone : A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS.
In Interspeech 2024. (pp. 1001-1002). International Speech Communication Association.
[13]
Kamelabad, A. M., Engwall, O., Skantze, G. (2024).
Conformity and Trust in Multi-party vs. Individual Human-Robot Interaction.
In Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents. New York, NY United States: Association for Computing Machinery (ACM).
[14]
[15]
Ternström, S., Bernardoni, N. H., Birkholz, P., Guasch, O., Gully, A. (Eds.). (2024).
Computational Analysis and Simulation of the Human Voice (Dagstuhl Seminar 24242)
. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
[16]
Werner, A. W., Beskow, J., Deichler, A. (2024).
Gesture Evaluation in Virtual Reality.
In ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction. (pp. 156-164). Association for Computing Machinery (ACM).
[17]
Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024).
A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams.
EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).
[18]
Kejriwal, J., Mishra, C., Skantze, G., Offrede, T. & Beňuš, Š. (2024).
Does a robot's gaze behavior affect entrainment in HRI?.
Computing and informatics, 43(5), 1256-1284.
[19]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A Critical Survey of Research in Music Genre Recognition.
In Proc. International Society for Music Information Retrieval Conference. ISMIR.
[20]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024).
AI Music Studies : Preparing for the Coming Flood.
In Proceedings of AI Music Creativity..
[21]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024).
Applying textual inversion to control and personalize text-to-music models.
In Proc. 15th Int. Workshop on Machine Learning and Music..
[22]
Dalmazzo, D., Déguernel, K., Sturm, B. (2024).
ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding.
In MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.
[23]
Kanhov, E. (2024).
Entanglements with Deepfake : AI Voice Models and their Diffractive Potential.
Presented at 12th New Materialisms Conference. Intersectional Materialisms: Diversity in Creative Industries, Methods & Practices. 26-28 August, 2024, Kildare, Ireland.
[24]
Borg, A., Jobs, B., Huss, V., Gentline, C., Espinosa, F., Ruiz, M. ... Parodis, I. (2024).
Enhancing clinical reasoning skills for medical students : a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology.
Rheumatology International.
[25]
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E., Alexanderson, S. (2024).
Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (pp. 1952-1964).
[26]
Wang, Y., Xu, Y., Skantze, G., Buschmeier, H. (2024).
How Much Does Nonverbal Communication Conform to Entropy Rate Constancy? : A Case Study on Listener Gaze in Interaction.
In 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference. (pp. 3533-3545). Association for Computational Linguistics (ACL).
[27]
Rafiei, S., Brunnström, K., Schenkman, B., Andersson, J., Sjöström, M. (2024).
Laboratory study : Human Interaction using Remote Control System for Airport Safety Management.
In 2024 16th International Conference on Quality of Multimedia Experience, QoMEX 2024. (pp. 167-170). Institute of Electrical and Electronics Engineers (IEEE).
[28]
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M. & Henter, G. E. (2024).
Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022.
ACM Transactions on Graphics, 43(3).
[29]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024).
An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors.
In Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[30]
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024).
Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model.
Cerebral Cortex, 34(3).
[31]
Jääskeläinen, P., Kanhov, E. (2024).
Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies.
In VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..
[32]
Ekström, A. G. (2024).
Correcting the record : Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022).
American Journal of Primatology, 86(8).
[33]
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024).
Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech.
Scientific Reports, 14(1).
[34]
Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024).
Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data.
In 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (pp. 219-224). Association for Computational Linguistics (ACL).
[35]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024).
Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music.
In Proceedings New Interfaces for Musical Expression NIME’24. International Conference on New Interfaces for Musical Expression.
[36]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024).
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[37]
Wang, S., Székely, É. (2024).
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 6464-6474). European Language Resources Association (ELRA).
[38]
Wennberg, U., Henter, G. E. (2024).
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[39]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024).
Introducing the TISMIR Education Track: What, Why, How?.
Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[40]
Casini, L., Jonason, N., Sturm, B. (2024).
Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation.
In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[41]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[42]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[43]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024).
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.
Journal of Speech, Language and Hearing Research, 1-22.
[44]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024).
Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps.
Journal of Voice.
[45]
Borg, A., Parodis, I., Skantze, G. (2024).
Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[46]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024).
Goes to the Heart: Speaking the User's Native Language.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[47]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024).
Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour.
Applied Sciences, 14(4).
[48]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024).
Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics.
Journal of the Acoustical Society of America, 155(1), 18-28.
[49]
Rosenberg, S., Sundberg, J. & Lã, F. (2024).
Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition.
Journal of Voice, 38(3), 585-594.
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024).
CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice.
Journal of Voice, 38(3), 549-560.