TMH Publications (latest 50)
Below are the 50 latest publications from the Department of Speech, Music and Hearing.
TMH Publications
[1]
Kamelabad, A. M., Engwall, O., Skantze, G. (2024).
Conformity and Trust in Multi-party vs. Individual Human-Robot Interaction.
In Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents. New York, NY United States: Association for Computing Machinery (ACM).
[2]
[3]
Ternström, S., Bernardoni, N. H., Birkholz, P., Guasch, O., Gully, A. (Eds.). (2024).
Computational Analysis and Simulation of the Human Voice (Dagstuhl Seminar 24242)
. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
[4]
Werner, A. W., Beskow, J., Deichler, A. (2024).
Gesture Evaluation in Virtual Reality.
In ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction. (pp. 156-164). Association for Computing Machinery (ACM).
[5]
Kynych, F., Cerva, P., Zdansky, J., Svendsen, T. & Salvi, G. (2024).
A lightweight approach to real-time speaker diarization : from audio toward audio-visual data streams.
EURASIP Journal on Audio, Speech, and Music Processing, 2024(1).
[6]
Kejriwal, J., Mishra, C., Skantze, G., Offrede, T. & Beňuš, Š. (2024).
Does a robot's gaze behavior affect entrainment in HRI?.
Computing and informatics, 43(5), 1256-1284.
[7]
[8]
Green, O., Sturm, B., Born, G., Wald-Fuhrmann, M. (2024).
A Critical Survey of Research in Music Genre Recognition.
In Proc. International Society for Music Information Retrieval Conference. ISMIR.
[9]
Sturm, B., Déguernel, K., Huang, R. S., Kaila, A.-K., Jääskeläinen, P., Kanhov, E., Cros Vila, L., Dalmazzo, D., Casini, L., Bown, O., Collins, N., Drott, E., Sterne, J., Holzapfel, A., Ben-Tal, O. (2024).
AI Music Studies : Preparing for the Coming Flood.
In Proceedings of AI Music Creativity..
[10]
Thomé, C., Sturm, B., Pertoft, J., Jonason, N. (2024).
Applying textual inversion to control and personalize text-to-music models.
In Proc. 15th Int. Workshop on Machine Learning and Music..
[11]
Dalmazzo, D., Déguernel, K., Sturm, B. (2024).
ChromaFlow: Modeling And Generating Harmonic Progressions With a Transformer And Voicing Encoding.
In MML 2024: 15th International Workshop on Machine Learning and Music, 2024, Vilnius, Lithuania. Vilnius, Lithuania.
[12]
Kanhov, E. (2024).
Entanglements with Deepfake : AI Voice Models and their Diffractive Potential.
Presented at 12th New Materialisms Conference. Intersectional Materialisms: Diversity in Creative Industries, Methods & Practices. 26-28 August, 2024, Kildare, Ireland.
[13]
Borg, A., Jobs, B., Huss, V., Gentline, C., Espinosa, F., Ruiz, M. ... Parodis, I. (2024).
Enhancing clinical reasoning skills for medical students : a qualitative comparison of LLM-powered social robotic versus computer-based virtual patients within rheumatology.
Rheumatology International.
[14]
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E., Alexanderson, S. (2024).
Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (pp. 1952-1964).
[15]
Benford, S., Amerotti, M., Sturm, B., Avila, J. M. (2024).
Negotiating Autonomy and Trust when Performing with an AI Musician.
In TAS 2024 - Proceedings of the 2nd International Symposium on Trustworthy Autonomous Systems. Association for Computing Machinery (ACM).
[16]
Wang, Y., Xu, Y., Skantze, G., Buschmeier, H. (2024).
How Much Does Nonverbal Communication Conform to Entropy Rate Constancy? : A Case Study on Listener Gaze in Interaction.
In 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference. (pp. 3533-3545). Association for Computational Linguistics (ACL).
[17]
Engström, H., Włodarczak, M., Ternström, S. (2024).
Mapping the effect of body position : Voice quality differences in connected speech.
In Proceedings of FONETIK 2024, Stockholm, June 3-5, 2024. (pp. 21-26). Stockholm Univeristy.
[18]
Rafiei, S., Brunnström, K., Schenkman, B., Andersson, J., Sjöström, M. (2024).
Laboratory study : Human Interaction using Remote Control System for Airport Safety Management.
In 2024 16th International Conference on Quality of Multimedia Experience, QoMEX 2024. (pp. 167-170). Institute of Electrical and Electronics Engineers (IEEE).
[19]
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M. & Henter, G. E. (2024).
Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022.
ACM Transactions on Graphics, 43(3).
[20]
Jansson, M., Tian, K., Hrastinski, S., Engwall, O. (2024).
An initial exploration of semi-automated tutoring : How AI could be used as support for online human tutors.
In Proceedings of the Fourteenth International Conference on Networked Learning. Aalborg University.
[21]
Arvidsson, C., Torubarova, E., Abelho Pereira, A. T. & Udden, J. (2024).
Conversational production and comprehension : fMRI-evidence reminiscent of but deviant from the classical Broca-Wernicke model.
Cerebral Cortex, 34(3).
[22]
Jääskeläinen, P., Kanhov, E. (2024).
Data Ethics and Practices of Human-Nonhuman Sound Technologies and Ecologies.
In VIHAR '24 - 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots..
[23]
Ekström, A. (2024).
Phonetic potential in the extant apes and extinct hominins
(Doctoral thesis , KTH Royal Institute of Technology, Stockholm, Sweden, TRITA-EECS-AVL 55). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-351250.
[24]
Ekström, A. G. (2024).
Correcting the record : Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022).
American Journal of Primatology, 86(8).
[25]
Ekström, A. G., Gannon, C., Edlund, J., Moran, S. & Lameira, A. R. (2024).
Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech.
Scientific Reports, 14(1).
[26]
Malmberg, F., Klezovich, A., Mesch, J., Beskow, J. (2024).
Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data.
In 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024. (pp. 219-224). Association for Computational Linguistics (ACL).
[27]
Mehta, S., Tu, R., Beskow, J., Székely, É., Henter, G. E. (2024).
MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING.
In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. (pp. 11341-11345). Institute of Electrical and Electronics Engineers (IEEE).
[28]
Amerotti, M., Sturm, B., Benford, S., Maruri-Aguilar, H., Vear, C. (2024).
Evaluation of an Interactive Music Performance System in the Context of Irish Traditional Dance Music.
In Proceedings New Interfaces for Musical Expression NIME’24. International Conference on New Interfaces for Musical Expression.
[29]
Jonason, N., Wang, X., Cooper, E., Juvela, L., Sturm, B., Yamagishi, J. (2024).
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
In Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)..
[30]
Tånnander, C., O'Regan, J., House, D., Edlund, J., Beskow, J. (2024).
Prosodic characteristics of English-accented Swedish neural TTS.
In Proceedings of Speech Prosody 2024. (pp. 1035-1039). Leiden, The Netherlands: International Speech Communication Association.
[31]
Misra, S., Boye, J. (2024).
Nested Noun Phrase Identification using BERT.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 12138-12143). European Language Resources Association (ELRA).
[32]
Malisz, Z., Foremski, J., Kul, M. (2024).
PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 13068-13073). European Language Resources Association (ELRA).
[33]
Inoue, K., Jiang, B., Ekstedt, E., Kawahara, T., Skantze, G. (2024).
Multilingual Turn-taking Prediction Using Voice Activity Projection.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 11873-11883). European Language Resources Association (ELRA).
[34]
Wang, S., Székely, É. (2024).
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model.
In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 6464-6474). European Language Resources Association (ELRA).
[35]
Wennberg, U., Henter, G. E. (2024).
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT.
In MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings. (pp. 35-40). European Language Resources Association (ELRA).
[36]
Müller, M., Dixon, S., Volk, A., Sturm, B., Rao, P. & Gotham, M. (2024).
Introducing the TISMIR Education Track: What, Why, How?.
Transactions of the International Society for Music Information Retrieval, 7(1), 85-98.
[37]
Casini, L., Jonason, N., Sturm, B. (2024).
Investigating the Viability of Masked Language Modeling for Symbolic Music Generation in abc-notation.
In ARTIFICIAL INTELLIGENCE IN MUSIC, SOUND, ART AND DESIGN, EVOMUSART 2024. (pp. 84-96). Springer Nature.
[38]
Ekström, A. G. (2024).
A Theory That Never Was: Wrong Way to the “Dawn of Speech”.
Biolinguistics, 18.
[39]
Kaila, A.-K., Sturm, B. (2024).
Agonistic Dialogue on the Value and Impact of AI Music Applications.
In Proceedings of the 2024 International Conference on AI and Musical Creativity. Oxford, UK.
[40]
Iob, N. A., He, L., Ternström, S., Cai, H. & Brockmann-Bauser, M. (2024).
Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment.
Journal of Speech, Language and Hearing Research, 1-22.
[41]
Ternström, S. (2024).
Pragmatic De-Noising of Electroglottographic Signals.
Bioengineering, 11(5), 479.
[42]
Cai, H., Ternström, S., Chaffanjon, P. & Henrich Bernardoni, N. (2024).
Effects on Voice Quality of Thyroidectomy : A Qualitative and Quantitative Study Using Voice Maps.
Journal of Voice.
[43]
Borg, A., Parodis, I., Skantze, G. (2024).
Creating Virtual Patients using Robots and Large Language Models : A Preliminary Study with Medical Students.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 273-277). Association for Computing Machinery (ACM).
[44]
Ashkenazi, S., Skantze, G., Stuart-Smith, J., Foster, M. E. (2024).
Goes to the Heart: Speaking the User's Native Language.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 214-218). Association for Computing Machinery (ACM).
[45]
Irfan, B., Staffa, M., Bobu, A., Churamani, N. (2024).
Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning.
In HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. (pp. 1323-1325). Association for Computing Machinery (ACM).
[46]
Wolfert, P., Henter, G. E. & Belpaeme, T. (2024).
Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour.
Applied Sciences, 14(4).
[47]
Sundberg, J., Salomão, G. L. & Scherer, K. R. (2024).
Emotional expressivity in singing : Assessing physiological and acoustic indicators of two opera singers' voice characteristics.
Journal of the Acoustical Society of America, 155(1), 18-28.
[48]
Kalpakchi, D. & Boye, J. (2024).
Quinductor: A multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies.
Natural Language Engineering, 217-255.
[49]
Rosenberg, S., Sundberg, J. & Lã, F. (2024).
Kulning : Acoustic and Perceptual Characteristics of a Calling Style Used Within the Scandinavian Herding Tradition.
Journal of Voice, 38(3), 585-594.
[50]
Baker, C. P., Sundberg, J., Purdy, S. C., Rakena, T. O. & Leão, S. H. D. S. (2024).
CPPS and Voice-Source Parameters : Objective Analysis of the Singing Voice.
Journal of Voice, 38(3), 549-560.