TMH Publications (latest 50)

Below are the 50 latest publications from the Department of Speech, Music and Hearing.

TMH Publications

[1]
Saponaro, G., Jamone, L., Bernardino, A. & Salvi, G. (2019). Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions. IEEE Transactions on Cognitive and Developmental Systems.
[2]
Selamtzis, A., Castellana, A., Salvi, G., Carullo, A. & Astolfi, A. (2019). Effect of vowel context in cepstral and entropy analysis of pathological voices. Biomedical Signal Processing and Control, 47, 350-357.
[3]
Per, F., Malisz, Z., Edlund, J. (2019). Bringing order to chaos : A non-sequential approach for browsing large sets of found audio data. In LREC 2018 - 11th International Conference on Language Resources and Evaluation. (pp. 4307-4311). European Language Resources Association (ELRA).
[4]
Shore, T., Androulakaki, T., Skantze, G. (2019). KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. In LREC 2018 - 11th International Conference on Language Resources and Evaluation. (pp. 768-775). Tokyo.
[5]
Körner Gustafsson, J., Södersten, M., Ternström, S. & Schalling, E. (2019). Long-term effects of Lee Silverman Voice Treatment on daily voice use in Parkinson’s disease as measured with a portable voice accumulator. Logopedics, Phoniatrics, Vocology, 44(3), 124-133.
[6]
Pabon, P. & Ternström, S. (2020). Feature maps of the acoustic spectrum of the voice. Journal of Voice, 34(1), 161.e1-161.e26.
[7]
Bisesi, E., Friberg, A. & Parncutt, R. (2019). A Computational Model of Immanent Accent Salience in Tonal Music. Frontiers in Psychology, 10(317), 1-19.
[8]
Finkel, S., Veit, R., Lotze, M., Friberg, A., Vuust, P., Soekadar, S. ... Kleber, B. (2019). Intermittent theta burst stimulation over right somatosensory larynx cortex enhances vocal pitch‐regulation in nonsingers. Human Brain Mapping.
[9]
Hallström, E., Mossmyr, S., Sturm, B., Vegeborn, V., Wedin, J. (2019). From Jigs and Reels to Schottisar och Polskor : Generating Scandinavian-like Folk Music with Deep Recurrent Networks. Presented at The 16th Sound & Music Computing Conference, Malaga, Spain, 28-31 May 2019.
[10]
Kucherenko, T., Hasegawa, D., Naoshi, K., Henter, G. E., Kjellström, H. (2019). On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract. Presented at International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada. (pp. 2072-2074). The International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).
[11]
Kontogiorgos, D., Abelho Pereira, A. T., Gustafson, J. (2019). The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots. In The Challenges of Working on Social Robots that Collaborate with People..
[12]
Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. In Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Ed.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press.
[13]
Friberg, A., Bisesi, E., Addessi, A. R. & Baroni, M. (2019). Probing the Underlying Principles of Perceived Immanent Accents Using a Modeling Approach. Frontiers in Psychology, 10.
[14]
Kucherenko, T., Hasegawa, D., Henter, G. E., Kaneko, N., Kjellström, H. (2019). Analyzing Input and Output Representations for Speech-Driven Gesture Generation. In 19th ACM International Conference on Intelligent Virtual Agents. New York, NY, USA: ACM Publications.
[15]
Ternström, S. (2019). Normalized time-domain parameters for electroglottographic waveforms. Journal of the Acoustical Society of America, 146(1), EL65-EL70.
[16]
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T., Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci)..
[17]
Rodríguez-Algarra, F., Sturm, B. & Dixon, S. (2019). Characterising Confounding Effects in Music Classification Experiments through Interventions. Transactions of the International Society for Music Information Retrieval, 52-66.
[18]
Mishra, S., Stoller, D., Benetos, E., Sturm, B., Dixon, S. (2019). GAN-Based Generation and Automatic Selection of Explanations for Neural Networks. Presented at Safe Machine Learning 2019 Workshop at the International Conference on Learning Representations.
[19]
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2).
[20]
Sturm, B., Iglesias, M., Ben-Tal, O., Miron, M. & Gómez, E. (2019). Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. MDPI Arts, 8(3).
[21]
Kontogiorgos, D. (2019). Multimodal Language Grounding for Human-Robot Collaboration : YRRSDS 2019 - Dimosthenis Kontogiorgos. In Young Researchers Roundtable on Spoken Dialogue Systems..
[22]
Lã, F. M.B., Ternström, S. (2019). Flow ball-assisted training : immediate effects on vocal fold contacting. In Pan-European Voice Conference 2019. (pp. 50-51). University of Copenhagen.
[23]
Ternström, S., Pabon, P. (2019). Accounting for variability over the voice range. In Proceedings of the ICA 2019 and EAA Euroregio. (pp. 7775-7780). Aachen, DE: Deutsche Gesellschaft für Akustik (DEGA e.V.).
[24]
Stefanov, K. (2019). Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition. IEEE Transactions on Cognitive and Developmental Systems.
[25]
Clark, L., Cowan, B. R., Edwards, J., Munteanu, C., Murad, C., Aylett, M., Moore, R. K., Edlund, J., Székely, É., Healey, P., Harte, N., Torre, I., Doyle, P. (2019). Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions. In CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. ASSOC COMPUTING MACHINERY.
[26]
Székely, É., Henter, G. E., Gustafson, J. (2019). CASTING TO CORPUS : SEGMENTING AND SELECTING SPONTANEOUS DIALOGUE FOR TTS WITH A CNN-LSTM SPEAKER-DEPENDENT BREATH DETECTOR. In 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP). (pp. 6925-6929). IEEE.
[27]
Jonell, P., Kucherenko, T., Ekstedt, E., Beskow, J. (2019). Learning Non-verbal Behavior for a Social Robot from YouTube Videos. Presented at ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019.
[28]
Kontogiorgos, D., Pereira, A., Gustafson, J. (2019). Estimating Uncertainty in Task Oriented Dialogue. In ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction..
[29]
Betz, S., Zarrieß, S., Székely, É., Wagner, P. (2019). The greennn tree - lengthening position influences uncertainty perception. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. (pp. 3990-3994). The International Speech Communication Association (ISCA).
[30]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Spontaneous conversational speech synthesis from found data. Presented at Interspeech.
[31]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff: Exploring extemporaneous speech delivery with TTS. Presented at Interspeech.
[32]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). How to train your fillers: uh and um in spontaneous speech synthesis. Presented at The 10th ISCA Speech Synthesis Workshop.
[33]
Zhang, C., Oztireli, C., Mandt, S., Salvi, G. (2019). Active Mini-Batch Sampling Using Repulsive Point Processes. Presented at 33rd AAAI Conference on Artificial Intelligence / 31st Innovative Applications of Artificial Intelligence Conference / 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Location: Honolulu, HI, JAN 27-FEB 01, 2019. (pp. 5741-5748). ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE.
[34]
Elowsson, A., Friberg, A. (2019). Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN. Presented at 20th International Society for Music In-formation Retrieval Conference, Delft, Netherlands, November 4-8, 2019.
[35]
Dubois, J., Elovsson, A., Friberg, A. (2019). Predicting Perceived Dissonance of Piano Chords Using a Chord-Class Invariant CNN and Deep Layered Learning. In Proceedings of 16th Sound & Music Computing Conference (SMC), Malaga, Spain. (pp. 530-536).
[36]
Kalpakchi, D., Boye, J. (2019). SpaceRefNet : a neural approach to spatial reference resolution in a real city environment. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. (pp. 422-431). Association for Computational Linguistics.
[37]
Kontogiorgos, D., Abelho Pereira, A. T., Andersson, O., Koivisto, M., Gonzalez Rabal, E., Vartiainen, V., Gustafson, J. (2019). The effects of anthropomorphism and non-verbal social behaviour in virtual assistants. In IVA 2019 - Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. (pp. 133-140). Association for Computing Machinery (ACM).
[38]
Gulz, T., Holzapfel, A., Friberg, A. (2019). Developing a Method for Identifying Improvisation Strategies in Jazz Duos. In Proc. of the 14th International Symposium on CMMR. (pp. 482-489). Marseille Cedex.
[39]
[40]
Arnela, M., Dabbaghchian, S., Guasch, O. & Engwall, O. (2019). MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs. IEEE Transactions on Audio, Speech, and Language Processing, 27(12), 2173-2182.
[41]
Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J., Gustafson, J. (2019). Modern speech synthesis for phonetic sciences : A discussion and an evaluation. In Proceedings of ICPhS..
[42]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff : Exploring extemporaneous speech delivery with TTS. Presented at The 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019 | Graz, Austria, Sep. 15-19, 2019.. (pp. 3687-3688).
[43]
Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Spontaneous conversational speech synthesis from found data. Presented at The 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019 | Graz, Austria, Sep. 15-19, 2019..
[44]
[45]
Sundberg, J. (2019). The Singing Voice. In S Früholz and P Belin (Ed.), The Oxcford Handbook of Voice Perception ( (1 ed.) pp. 117-142). Oxford: Oxford University Press.
[46]
Sundberg, J. (2019). Intonation in Singing. In G Welch, DM Howard, J Nix (Ed.), The Oxford Handbook of Singing ( (1 ed.) pp. 281-296). Oxford: Oxford University Press.
[47]
Sundberg, J. (2019). The Acoustics of Different Genres of Singing. In G Welch, DM Howard, J Nix (Ed.), The Oxford Handbook of Singing ( (1 ed.) pp. 167-188). Oxford: Oxford University Press.
[48]
Chettri, B., Stoller, D., Morfi, V., Martínez Ramírez, M. A., Benetos, E., Sturm, B. (2019). Ensemble models for spoofing detection in automatic speaker verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. (pp. 1018-1022). International Speech Communication Association.
[49]
Patel, R., Ternström, S. (2019). Electroglottographic voice maps of untrained vocally healthy adults with gender differences and gradients. In Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA): Proc of 11th Int’l Workshop. (pp. 107-110). Firenze, Italy: Firenze University Press.
[50]
Jonell, P., Lopes, J., Per, F., Wennberg, U., Doğan, F. I., Skantze, G. (2019). Crowdsourcing a self-evolving dialog graph. In CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces. Association for Computing Machinery (ACM).
Full list in the KTH publications portal
Page responsible:Web editors at EECS
Belongs to: Speech, Music and Hearing
Last changed: Oct 17, 2018