TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

Saponaro, G., Jamone, L., Bernardino, A. & Salvi, G. (2019). Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions. IEEE Transactions on Cognitive and Developmental Systems.
Selamtzis, A., Castellana, A., Salvi, G., Carullo, A. & Astolfi, A. (2019). Effect of vowel context in cepstral and entropy analysis of pathological voices. Biomedical Signal Processing and Control, 47, 350-357.
Fallgren, P., Malisz, Z., Edlund, J. (2019). Bringing order to chaos : A non-sequential approach for browsing large sets of found audio data. I LREC 2018 - 11th International Conference on Language Resources and Evaluation. (s. 4307-4311). European Language Resources Association (ELRA).
Shore, T., Androulakaki, T., Skantze, G. (2019). KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. I LREC 2018 - 11th International Conference on Language Resources and Evaluation. (s. 768-775). Tokyo.
Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2019). Long-term effects of Lee Silverman Voice Treatment on daily voice use in Parkinson’s disease as measured with a portable voice accumulator. Logopedics, Phoniatrics, Vocology, 44(3), 124-133.
Pabon, P. & Ternström, S. (2020). Feature maps of the acoustic spectrum of the voice. Journal of Voice, 34(1), 161.e1-161.e26.
Hallström, E., Mossmyr, S., Sturm, B., Vegeborn, V., Wedin, J. (2019). From Jigs and Reels to Schottisar och Polskor : Generating Scandinavian-like Folk Music with Deep Recurrent Networks. Presenterad vid The 16th Sound & Music Computing Conference, Malaga, Spain, 28-31 May 2019.
Finkel, S., Veit, R., Lotze, M., Friberg, A., Vuust, P., Soekadar, S. ... Kleber, B. (2019). Intermittent theta burst stimulation over right somatosensory larynx cortex enhances vocal pitch‐regulation in nonsingers. Human Brain Mapping.
Bisesi, E., Friberg, A. & Parncutt, R. (2019). A Computational Model of Immanent Accent Salience in Tonal Music. Frontiers in Psychology, 10(317), 1-19.
Ternström, S. (2019). Normalized time-domain parameters for electroglottographic waveforms. Journal of the Acoustical Society of America, 146(1), EL65-EL70.
Kucherenko, T., Hasegawa, D., Henter, G. E., Kaneko, N., Kjellström, H. (2019). Analyzing Input and Output Representations for Speech-Driven Gesture Generation. I 19th ACM International Conference on Intelligent Virtual Agents. New York, NY, USA: ACM Publications.
Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. I Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Red.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press.
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2).
Mishra, S., Stoller, D., Benetos, E., Sturm, B., Dixon, S. (2019). GAN-Based Generation and Automatic Selection of Explanations for Neural Networks. Presenterad vid Safe Machine Learning 2019 Workshop at the International Conference on Learning Representations.
Kontogiorgos, D. (2019). Multimodal Language Grounding for Human-Robot Collaboration : YRRSDS 2019 - Dimosthenis Kontogiorgos. I Young Researchers Roundtable on Spoken Dialogue Systems..
Ternström, S., Pabon, P. (2019). Accounting for variability over the voice range. I Proceedings of the ICA 2019 and EAA Euroregio. (s. 7775-7780). Aachen, DE: Deutsche Gesellschaft für Akustik (DEGA e.V.).
Sturm, B., Iglesias, M., Ben-Tal, O., Miron, M. & Gómez, E. (2019). Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. MDPI Arts, 8(3).
Lã, F. M.B., Ternström, S. (2019). Flow ball-assisted training : immediate effects on vocal fold contacting. I Pan-European Voice Conference 2019. (s. 50-51). University of Copenhagen.
Rodríguez-Algarra, F., Sturm, B. & Dixon, S. (2019). Characterising Confounding Effects in Music Classification Experiments through Interventions. Transactions of the International Society for Music Information Retrieval, 52-66.
Clark, L., Cowan, B. R., Edwards, J., Munteanu, C., Murad, C., Aylett, M., Moore, R. K., Edlund, J., Székely, É., Healey, P., Harte, N., Torre, I., Doyle, P. (2019). Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions. I CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. ASSOC COMPUTING MACHINERY.
Kontogiorgos, D., Abelho Pereira, A. T., Gustafson, J. (2019). Estimating Uncertainty in Task Oriented Dialogue. I ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction. (s. 414-418). ACM Digital Library.
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff: Exploring extemporaneous speech delivery with TTS. Presenterad vid Interspeech.
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). How to train your fillers: uh and um in spontaneous speech synthesis. Presenterad vid The 10th ISCA Speech Synthesis Workshop.
Zhang, C., Oztireli, C., Mandt, S., Salvi, G. (2019). Active Mini-Batch Sampling Using Repulsive Point Processes. Presenterad vid 33rd AAAI Conference on Artificial Intelligence / 31st Innovative Applications of Artificial Intelligence Conference / 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Location: Honolulu, HI, JAN 27-FEB 01, 2019. (s. 5741-5748). ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE.
Jonell, P., Kucherenko, T., Ekstedt, E., Beskow, J. (2019). Learning Non-verbal Behavior for a Social Robot from YouTube Videos. Presenterad vid ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019.
Sundberg, J. (2019). Intonation in Singing. I G Welch, DM Howard, J Nix (Red.), The Oxford Handbook of Singing ( (1 uppl.) s. 281-296). Oxford: Oxford University Press.
Gulz, T., Holzapfel, A., Friberg, A. (2019). Developing a Method for Identifying Improvisation Strategies in Jazz Duos. I Proc. of the 14th International Symposium on CMMR. (s. 482-489). Marseille Cedex.
Elowsson, A., Friberg, A. (2019). Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN. Presenterad vid 20th International Society for Music In-formation Retrieval Conference, Delft, Netherlands, November 4-8, 2019.
Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J., Gustafson, J. (2019). Modern speech synthesis for phonetic sciences : A discussion and an evaluation. I Proceedings of ICPhS..
Arnela, M., Dabbaghchian, S., Guasch, O. & Engwall, O. (2019). MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs. IEEE Transactions on Audio, Speech, and Language Processing, 27(12), 2173-2182.
Ardal, D., Alexandersson, S., Lempert, M., Abelho Pereira, A. T. (2019). A Collaborative Previsualization Tool for Filmmaking in Virtual Reality. I Proceedings - CVMP 2019: 16th ACM SIGGRAPH European Conference on Visual Media Production. ACM Digital Library.
Chettri, B., Stoller, D., Morfi, V., Martínez Ramírez, M. A., Benetos, E., Sturm, B. (2019). Ensemble models for spoofing detection in automatic speaker verification. I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. (s. 1018-1022). International Speech Communication Association.
Henter, G. E., Alexanderson, S. & Beskow, J. (2019). Moglow : Probabilistic and controllable motion synthesis using normalising flows. arXiv preprint arXiv:1905.06598.
Engwall, O. (2020). Robot interaction styles for conversation practice in second language learning. International Journal of Social Robotics.
Lousseief, E., Sturm, B. (2019). MahlerNet : Unbounded Orchestral Music with Neural Networks. I Combined proceedings of the Nordic Sound and Music Computing Conference 2019 and the Interactive Sonification Workshop 2019. (s. 57-63).
Patel, R., Ternström, S. (2019). Electroglottographic voice maps of untrained vocally healthy adults with gender differences and gradients. I Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA): 11th International Workshop. (s. 107-110). Firenze, Italy: Firenze University Press.
Kontogiorgos, D., van Waveren, S., Wallberg, O., Abelho Pereira, A. T., Leite, I., Gustafson, J. (2020). Embodiment Effects in Interactions with Failing Robots. Presenterad vid SIGCHI Conference on Human Factors in Computing Systems, CHI ’20, April 25–30, 2020, Honolulu, HI, USA. ACM Digital Library.
Axelsson, N., Skantze, G. (2019). Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees. I 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference. (s. 345-352). Stroudsburg, PA.
Jonell, P., Lopes, J., Fallgren, P., Wennberg, U., Doğan, F. I., Skantze, G. (2019). Crowdsourcing a self-evolving dialog graph. I CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces. Association for Computing Machinery (ACM).
Alexanderson, S., Henter, G. E., Kucherenko, T., Beskow, J. (2020). Style-Controllable Speech-Driven Gesture SynthesisUsing Normalising Flows. Presenterad vid EUROGRAPHICS 2020.
Abelho Pereira, A. T., Oertel, C., Fermoselle, L., Mendelson, J., Gustafson, J. (2020). Effects of Different Interaction Contexts when Evaluating Gaze Models in HRI. Presenterad vid International Conference on Human Robot Interaction (HRI).
Kontogiorgos, D., Abelho Pereira, A. T., Sahindal, B., van Waveren, S., Gustafson, J. (2020). Behavioural Responses to Robot Conversational Failures. Presenterad vid International Conference on Human Robot Interaction (HRI), HRI ’20, March 23–26, 2020, Cambridge, United Kingdom. ACM Digital Library.
Ibrahim, O., Skantze, G., Stoll, S., Dellwo, V. (2019). Fundamental frequency accommodation in multi-party human-robot game interactions : The effect of winning or losing. I Proceedings Interspeech 2019. (s. 3980-3984). International Speech Communication Association.
Fallgren, P., Malisz, Z., Edlund, J. (2019). How to annotate 100 hours in 45 minutes. I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (s. 341-345). ISCA.
Purwins, H., Sturm, B., Li, B., Nam, J. & Alwan, A. (2019). Introduction to the Issue on Data Science : Machine Learning for Audio Signal Processing. IEEE Journal on Selected Topics in Signal Processing, 13(2), 203-205.
Leijon, A., Dahlquist, M. & Smeds, K. (2019). Bayesian analysis of paired-comparison sound quality ratings. Journal of the Acoustical Society of America, 146(5), 3174-3183.
Kontogiorgos, D., Pelikan, H. (2020). Towards Adaptive and Least-Collaborative-Effort Social Robots. I ACM/IEEE International Conference on Human-Robot Interaction. IEEE conference proceedings.
Engwall, O. (2020). Robot interaction styles for conversation practice in second language learning. International Journal of Social Robotics.
Fullständig lista i KTH:s publikationsportal