TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Saponaro, G., Jamone, L., Bernardino, A. & Salvi, G. (2019). Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions. IEEE Transactions on Cognitive and Developmental Systems.
[2]
Selamtzis, A., Castellana, A., Salvi, G., Carullo, A. & Astolfi, A. (2019). Effect of vowel context in cepstral and entropy analysis of pathological voices. Biomedical Signal Processing and Control, 47, 350-357.
[3]
Per, F., Malisz, Z., Edlund, J. (2019). Bringing order to chaos : A non-sequential approach for browsing large sets of found audio data. I LREC 2018 - 11th International Conference on Language Resources and Evaluation. (s. 4307-4311). European Language Resources Association (ELRA).
[4]
Shore, T., Androulakaki, T., Skantze, G. (2019). KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. I LREC 2018 - 11th International Conference on Language Resources and Evaluation. (s. 768-775). Tokyo.
[5]
Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2019). Long-term effects of Lee Silverman Voice Treatment on daily voice use in Parkinson’s disease as measured with a portable voice accumulator. Logopedics, Phoniatrics, Vocology, 44(3), 124-133.
[6]
Selamtzis, A., Ternström, S., Richter, B., Burk, F., Köberlein, M., Echternach, M. (2018). A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers. (Manuskript).
[7]
Selamtzis, A., Ternström, S., Richter, B., Burk, F., Köberlein, M. & Echternach, M. (2018). A comparison of electroglottographic and glottal area waveforms for phonation type differentiation in male professional singers. Journal of the Acoustical Society of America, 144(6), 3275-3288.
[8]
Bisesi, E., Friberg, A. & Parncutt, R. (2019). A Computational Model of Immanent Accent Salience in Tonal Music. Frontiers in Psychology, 10(317), 1-19.
[9]
Finkel, S., Veit, R., Lotze, M., Friberg, A., Vuust, P., Soekadar, S. ... Kleber, B. (2019). Intermittent theta burst stimulation over right somatosensory larynx cortex enhances vocal pitch‐regulation in nonsingers. Human Brain Mapping.
[10]
Hallström, E., Mossmyr, S., Sturm, B., Vegeborn, V., Wedin, J. (2019). From Jigs and Reels to Schottisar och Polskor : Generating Scandinavian-like Folk Music with Deep Recurrent Networks. Presenterad vid The 16th Sound & Music Computing Conference, Malaga, Spain, 28-31 May 2019.
[11]
Kucherenko, T., Hasegawa, D., Naoshi, K., Henter, G. E., Kjellström, H. (2019). On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract. Presenterad vid International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada. (s. 2072-2074). The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).
[12]
Kontogiorgos, D., Abelho Pereira, A. T., Gustafson, J. (2019). The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots. I The Challenges of Working on Social Robots that Collaborate with People..
[13]
Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. I Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Red.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press.
[14]
Friberg, A., Bisesi, E., Addessi, A. R. & Baroni, M. (2019). Probing the Underlying Principles of Perceived Immanent Accents Using a Modeling Approach. Frontiers in Psychology, 10.
[15]
Kucherenko, T., Hasegawa, D., Henter, G. E., Kaneko, N., Kjellström, H. (2019). Analyzing Input and Output Representations for Speech-Driven Gesture Generation. I 19th ACM International Conference on Intelligent Virtual Agents. New York, NY, USA: ACM Publications.
[16]
Ternström, S. (2019). Normalized time-domain parameters for electroglottographic waveforms. Journal of the Acoustical Society of America, 146(1), EL65-EL70.
[17]
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T., Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. I Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci)..
[18]
Rodríguez-Algarra, F., Sturm, B. & Dixon, S. (2019). Characterising Confounding Effects in Music Classification Experiments through Interventions. Transactions of the International Society for Music Information Retrieval, 52-66.
[19]
Mishra, S., Stoller, D., Benetos, E., Sturm, B., Dixon, S. (2019). GAN-Based Generation and Automatic Selection of Explanations for Neural Networks. Presenterad vid Safe Machine Learning 2019 Workshop at the International Conference on Learning Representations.
[20]
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2).
[21]
Sturm, B., Iglesias, M., Ben-Tal, O., Miron, M. & Gómez, E. (2019). Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. MDPI Arts, 8(3).
[22]
Kontogiorgos, D. (2019). Multimodal Language Grounding for Human-Robot Collaboration : YRRSDS 2019 - Dimosthenis Kontogiorgos. I Young Researchers Roundtable on Spoken Dialogue Systems..
[23]
Lã, F. M.B., Ternström, S. (2019). Flow ball-assisted training : immediate effects on vocal fold contacting. I Pan-European Voice Conference 2019. (s. 50-51). University of Copenhagen.
[24]
Ternström, S., Pabon, P. (2019). Accounting for variability over the voice range. I Proceedings of the ICA 2019 and EAA Euroregio. (s. 7775-7780). Aachen, DE: Deutsche Gesellschaft für Akustik (DEGA e.V.).
[25]
Stefanov, K. (2019). Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition. IEEE Transactions on Cognitive and Developmental Systems.
[26]
Clark, L., Cowan, B. R., Edwards, J., Munteanu, C., Murad, C., Aylett, M., Moore, R. K., Edlund, J., Székely, É., Healey, P., Harte, N., Torre, I., Doyle, P. (2019). Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions. I CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. ASSOC COMPUTING MACHINERY.
[27]
Székely, É., Henter, G. E., Gustafson, J. (2019). CASTING TO CORPUS : SEGMENTING AND SELECTING SPONTANEOUS DIALOGUE FOR TTS WITH A CNN-LSTM SPEAKER-DEPENDENT BREATH DETECTOR. I 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP). (s. 6925-6929). IEEE.
[28]
Jonell, P., Kucherenko, T., Ekstedt, E., Beskow, J. (2019). Learning Non-verbal Behavior for a Social Robot from YouTube Videos. Presenterad vid ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019.
[29]
Kontogiorgos, D., Pereira, A., Gustafson, J. (2019). Estimating Uncertainty in Task Oriented Dialogue. I ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction..
[30]
Betz, S., Zarrieß, S., Székely, É., Wagner, P. (2019). The greennn tree - lengthening position influences uncertainty perception. I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. (s. 3990-3994). The International Speech Communication Association (ISCA).
[31]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Spontaneous conversational speech synthesis from found data. Presenterad vid Interspeech.
[32]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff: Exploring extemporaneous speech delivery with TTS. Presenterad vid Interspeech.
[33]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). How to train your fillers: uh and um in spontaneous speech synthesis. Presenterad vid The 10th ISCA Speech Synthesis Workshop.
[34]
Zhang, C., Oztireli, C., Mandt, S., Salvi, G. (2019). Active Mini-Batch Sampling Using Repulsive Point Processes. Presenterad vid 33rd AAAI Conference on Artificial Intelligence / 31st Innovative Applications of Artificial Intelligence Conference / 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Location: Honolulu, HI, JAN 27-FEB 01, 2019. (s. 5741-5748). ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE.
[35]
Elowsson, A., Friberg, A. (2019). Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN. Presenterad vid 20th International Society for Music In-formation Retrieval Conference, Delft, Netherlands, November 4-8, 2019.
[36]
Dubois, J., Elovsson, A., Friberg, A. (2019). Predicting Perceived Dissonance of Piano Chords Using a Chord-Class Invariant CNN and Deep Layered Learning. I Proceedings of 16th Sound & Music Computing Conference (SMC), Malaga, Spain. (s. 530-536).
[37]
Kalpakchi, D., Boye, J. (2019). SpaceRefNet : a neural approach to spatial reference resolution in a real city environment. I Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. (s. 422-431). Association for Computational Linguistics.
[38]
Kontogiorgos, D., Abelho Pereira, A. T., Andersson, O., Koivisto, M., Gonzalez Rabal, E., Vartiainen, V., Gustafson, J. (2019). The effects of anthropomorphism and non-verbal social behaviour in virtual assistants. I IVA 2019 - Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. (s. 133-140). Association for Computing Machinery (ACM).
[39]
Gulz, T., Holzapfel, A., Friberg, A. (2019). Developing a Method for Identifying Improvisation Strategies in Jazz Duos. I Proc. of the 14th International Symposium on CMMR. (s. 482-489). Marseille Cedex.
[40]
Jonell, P. (2019). Using Social and Physiological Signals for User Adaptation in Conversational Agents. I AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS. (s. 2420-2422). ASSOC COMPUTING MACHINERY.
[41]
[42]
Arnela, M., Dabbaghchian, S., Guasch, O. & Engwall, O. (2019). MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs. IEEE Transactions on Audio, Speech, and Language Processing, 27(12), 2173-2182.
[43]
Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J., Gustafson, J. (2019). Modern speech synthesis for phonetic sciences : A discussion and an evaluation. I Proceedings of ICPhS..
[44]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff : Exploring extemporaneous speech delivery with TTS. Presenterad vid The 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019 | Graz, Austria, Sep. 15-19, 2019.. (s. 3687-3688).
[45]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Spontaneous conversational speech synthesis from found data. Presenterad vid The 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019 | Graz, Austria, Sep. 15-19, 2019..
[46]
[47]
Sundberg, J. (2019). The Singing Voice. I S Früholz and P Belin (Red.), The Oxcford Handbook of Voice Perception ( (1 uppl.) s. 117-142). Oxford: Oxford University Press.
[48]
Sundberg, J. (2019). Intonation in Singing. I G Welch, DM Howard, J Nix (Red.), The Oxford Handbook of Singing ( (1 uppl.) s. 281-296). Oxford: Oxford University Press.
[49]
Sundberg, J. (2019). The Acoustics of Different Genres of Singing. I G Welch, DM Howard, J Nix (Red.), The Oxford Handbook of Singing ( (1 uppl.) s. 167-188). Oxford: Oxford University Press.
[50]
Chettri, B., Stoller, D., Morfi, V., Martínez Ramírez, M. A., Benetos, E., Sturm, B. (2019). Ensemble models for spoofing detection in automatic speaker verification. I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. (s. 1018-1022). International Speech Communication Association.
Fullständig lista i KTH:s publikationsportal