TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Cai, H. & Ternström, S. (2025).
A WaveNet-based model for predicting the electroglottographic signal from the acoustic voice signal.
Journal of the Acoustical Society of America, 157(4), 3033-3044.
[2]
Herbst, C. T., Tokuda, I. T., Nishimura, T., Ternström, S., Ossio, V., Levy, M. ... Dunn, J. C. (2025).
‘Monkey yodels’—frequency jumps in New World monkey vocalizations greatly surpass human vocal register transitions.
Philosophical Transactions of the Royal Society of London. Biological Sciences, 380(1923).
[3]
Irfan, B., Kuoppamäki, S., Hosseini, A. & Skantze, G. (2025).
Between reality and delusion : challenges of applying large language models to companion robots for open-domain dialogues with older adults.
Autonomous Robots, 49(1).
[4]
Borg, A., Georg, C., Jobs, B., Huss, V., Waldenlind, K., Ruiz, M. ... Parodis, I. (2025).
Virtual Patient Simulations Using Social Robotics Combined With Large Language Models for Clinical Reasoning Training in Medical Education: Mixed Methods Study.
Journal of Medical Internet Research, 27.
[5]
Cai, H. (2025).
Mapping voice quality in normal, pathological and synthetic voices
(Doktorsavhandling , KTH Royal Institute of Technology, Stockholm, TRITA-EECS-AVL 2025:25). Hämtad från https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-360211.
[6]
Kanhov, E., Kaila, A.-K. & Sturm, B. L. T. (2025).
Innovation, data colonialism and ethics : critical reflections on the impacts of AI on Irish traditional music.
Journal of New Music Research, 1-17.
[7]
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025).
Classification of voice quality using neck-surface acceleration : Comparison with glottal flow and radiated sound.
Journal of Voice, 39(1), 10-24.
[8]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A critical survey of research in music genre recognition.
I Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024. (s. 745-782). International Society for Music Information Retrieval.
[9]
Kaila, A.-K., Kanhov, E., Sturm, B. (2024).
Ethnographic Considerations and Critical Reflections on the Impacts of AI on Traditional Irish Music.
Presenterad vid British Forum for Ethnomusicology & International Council for Traditional Music Ireland Joint-Annual Conference, Cork, Ireland University College Cork, 4-7 April, 2024.
[10]
Sturm, B., Kanhov, E., Holzapfel, A. (Red.). (2024).
Collected Materials of The First International Conference in AI Music Studies : Prospects, Challenges and Methodologies of Studying AI Music in the Humanities and Social Sciences
. KTH Royal Institute of Technology.
[11]
Cao, X., Fan, Z., Svendsen, T., Salvi, G. (2024).
A Framework for Phoneme-Level Pronunciation Assessment Using CTC.
I Interspeech 2024. (s. 302-306). International Speech Communication Association.
[12]
Blomsma, P., Vaitonyté, J., Skantze, G. & Swerts, M. (2024).
Backchannel behavior is idiosyncratic.
Language and Cognition, 16(4), 1158-1181.
[13]
Edlund, J., Tånnander, C., Le Maguer, S., Wagner, P. (2024).
Assessing the impact of contextual framing on subjective TTS quality.
I Interspeech 2024. (s. 1205-1209). International Speech Communication Association.
[14]
Székely, É., Hope, M. (2024).
An inclusive approach to creating a palette of synthetic voices for gender diversity.
I Interspeech 2024. (s. 3070-3074). International Speech Communication Association.
[15]
Tånnander, C., Mehta, S., Beskow, J., Edlund, J. (2024).
Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis.
I Interspeech 2024. (s. 2815-2819). International Speech Communication Association.
[16]
Wang, S., Székely, É., Gustafsson, J. (2024).
Contextual Interactive Evaluation of TTS Models in Dialogue Systems.
I Interspeech 2024. (s. 2965-2969). International Speech Communication Association.
[17]
Lameris, H., Gustafsson, J., Székely, É. (2024).
CreakVC : A Voice Conversion Tool for Modulating Creaky Voice.
I Interspeech 2024. (s. 1005-1006). International Speech Communication Association.
[18]
Francis, J., Székely, É., Gustafsson, J. (2024).
ConnecTone : A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS.
I Interspeech 2024. (s. 1001-1002). International Speech Communication Association.
[19]
Kamelabad, A. M., Engwall, O., Skantze, G. (2024).
Conformity and Trust in Multi-party vs. Individual Human-Robot Interaction.
I Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents. New York, NY United States: Association for Computing Machinery (ACM).
[20]
[21]
Ternström, S., Bernardoni, N. H., Birkholz, P., Guasch, O., Gully, A. (Red.). (2024).
Computational Analysis and Simulation of the Human Voice (Dagstuhl Seminar 24242)
. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
[22]
Werner, A. W., Beskow, J., Deichler, A. (2024).
Gesture Evaluation in Virtual Reality.
I ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction. (s. 156-164). Association for Computing Machinery (ACM).
[23]
Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024).
A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams.
EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).
[24]
Kejriwal, J., Mishra, C., Skantze, G., Offrede, T. & Beňuš, Š. (2024).
Does a robot's gaze behavior affect entrainment in HRI?.
Computing and informatics, 43(5), 1256-1284.
[25]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A Critical Survey of Research in Music Genre Recognition.
I Proc. International Society for Music Information Retrieval Conference. ISMIR.
[26]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024).
AI Music Studies : Preparing for the Coming Flood.
I Proceedings of AI Music Creativity..
[27]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024).
Applying textual inversion to control and personalize text-to-music models.
I Proc. 15th Int. Workshop on Machine Learning and Music..
[28]
Dalmazzo, D., Déguernel, K., Sturm, B. (2024).
ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding.
I MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.
[29]
Kanhov, E. (2024).
Entanglements with Deepfake : AI Voice Models and their Diffractive Potential.
Presenterad vid 12th New Materialisms Conference. Intersectional Materialisms: Diversity in Creative Industries, Methods & Practices. 26-28 August, 2024, Kildare, Ireland.
[30]
Borg, A., Jobs, B., Huss, V., Gentline, C., Espinosa, F., Ruiz, M. ... Parodis, I. (2024).
Enhancing clinical reasoning skills for medical students : a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology.
Rheumatology International, 44(12), 3041-3051.
[31]
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E., Alexanderson, S. (2024).
Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
I Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (s. 1952-1964).
[32]
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M. & Henter, G. E. (2024).
Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022.
ACM Transactions on Graphics, 43(3).
[33]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024).
An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors.
I Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[34]
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024).
Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model.
Cerebral Cortex, 34(3).
[35]
Jääskeläinen, P., Kanhov, E. (2024).
Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies.
I VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..
[36]
Ekström, A. G. (2024).
Correcting the record : Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022).
American Journal of Primatology, 86(8).
[37]
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024).
Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech.
Scientific Reports, 14(1).
[38]
Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024).
Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data.
I 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (s. 219-224). Association for Computational Linguistics (ACL).
[39]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024).
Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music.
I Proceedings New Interfaces for Musical Expression NIME’24. International Conference on New Interfaces for Musical Expression.
[40]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024).
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
I Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[41]
Wang, S., Székely, É. (2024).
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model.
I 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (s. 6464-6474). European Language Resources Association (ELRA).
[42]
Wennberg, U., Henter, G. E. (2024).
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
I MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (s. 35-40). European Language Resources Association (ELRA).
[43]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[44]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
I Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[45]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024).
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.
Journal of Speech, Language and Hearing Research, 67(6), 1660-1681.
[46]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024).
Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps.
Journal of Voice.
[47]
Borg, A., Parodis, I., Skantze, G. (2024).
Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students.
I HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (s. 273-277). Association for Computing Machinery (ACM).
[48]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024).
Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour.
Applied Sciences, 14(4).
[49]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024).
Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics.
Journal of the Acoustical Society of America, 155(1), 18-28.
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024).
CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice.
Journal of Voice, 38(3), 549-560.