Publications by Joakim Gustafsson

Peer reviewed

Articles

[1]

Kontogiorgos, D., Abelho Pereira, A. T. & Gustafsson, J. (2021). Grounding behaviours with conversational interfaces: effects of embodiment and failures. Journal on Multimodal User Interfaces, 15(2), 239-254.

[2]

Kontogiorgos, D. & Gustafson, J. (2021). Measuring Collaboration Load With Pupillary Responses-Implications for the Design of Instructions in Task-Oriented HRI. Frontiers in Psychology, 12.

[3]

Jonell, P., Moell, B., Håkansson, K., Henter, G. E., Kucherenko, T., Mikheeva, O. ... Beskow, J. (2021). Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia : Clinical Feasibility and Preliminary Results. Frontiers in Computer Science, 3.

[4]

Oertel, C., Jonell, P., Kontogiorgos, D., Mora, K. F., Odobez, J.-M. & Gustafsson, J. (2021). Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions. Frontiers in Robotics and AI, 8.

[5]

Meena, R., Skantze, G. & Gustafsson, J. (2014). Data-driven models for timing feedback responses in a Map Task dialogue system. Computer speech & language (Print), 28(4), 903-922.

[6]

Mirnig, N., Weiss, A., Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J. ... Tscheligi, M. (2013). Face-To-Face With A Robot : What do we actually talk about?. International Journal of Humanoid Robotics, 10(1), 1350011.

[7]

Neiberg, D., Salvi, G. & Gustafson, J. (2013). Semi-supervised methods for exploring the acoustics of simple productive feedback. Speech Communication, 55(3), 451-469.

[8]

Edlund, J., Gustafson, J., Heldner, M. & Hjalmarsson, A. (2008). Towards human-like spoken dialogue systems. Speech Communication, 50(8-9), 630-645.

[9]

Boye, J., Gustafson, J. & Wiren, M. (2006). Robust spoken language understanding in a computer game. Speech Communication, 48(03-4), 335-353.

[10]

Gustafson, J. & Bell, L. (2000). Speech technology on trial : Experiences from the August system. Natural Language Engineering, 6(3-4), 273-286.

Conference papers

[11]

Marcinek, L., Beskow, J., Gustafsson, J. (2025). A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI. In Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings. (pp. 290-297). Springer Nature.

[12]

Francis, J., Gustafsson, J., Székely, É. (2025). From Static to Dynamic : Enhancing AAC with Generative Imagery and Zero-Shot TTS. In Interspeech 2025. (pp. 4960-4962). International Speech Communication Association.

[13]

Marcinek, L., Beskow, J., Gustafsson, J. (2025). Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments. In Interspeech 2025. (pp. 2165-2169). International Speech Communication Association.

[14]

Lameris, H., Gustafsson, J., Székely, É. (2025). VoiceQualityVC : A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech. In Interspeech 2025. (pp. 2295-2299). International Speech Communication Association.

[15]

Francis, J., Székely, É., Gustafsson, J. (2024). ConnecTone : A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS. In Interspeech 2024. (pp. 1001-1002). International Speech Communication Association.

[16]

Wang, S., Székely, É., Gustafsson, J. (2024). Contextual Interactive Evaluation of TTS Models in Dialogue Systems. In Interspeech 2024. (pp. 2965-2969). International Speech Communication Association.

[17]

Lameris, H., Gustafsson, J., Székely, É. (2024). CreakVC : A Voice Conversion Tool for Modulating Creaky Voice. In Interspeech 2024. (pp. 1005-1006). International Speech Communication Association.

[18]

Abelho Pereira, A. T., Marcinek, L., Miniotaitė, J., Thunberg, S., Lagerstedt, E., Gustafsson, J., Skantze, G., Irfan, B. (2024). Multimodal User Enjoyment Detection in Human-Robot Conversation : The Power of Large Language Models. Presented at 26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024. (pp. 469-478). Association for Computing Machinery (ACM).

[19]

Tånnander, C., Edlund, J., Gustafsson, J. (2024). Revisiting Three Text-to-Speech Synthesis Experiments with a Web-Based Audience Response System. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 14111-14121). European Language Resources Association (ELRA).

[20]

Lameris, H., Székely, É., Gustafsson, J. (2024). The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. (pp. 16058-16065). European Language Resources Association (ELRA).

[21]

Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Institute of Electrical and Electronics Engineers (IEEE).

[22]

Peña, P. R., Doyle, P. R., Ip, E. Y., Di Liberto, G., Higgins, D., McDonnell, R., Branigan, H., Gustafsson, J., McMillan, D., Moore, R. J., Cowan, B. R. (2023). A Special Interest Group on Developing Theories of Language Use in Interaction with Conversational User Interfaces. In CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (ACM).

[23]

Ekstedt, E., Wang, S., Székely, É., Gustafsson, J., Skantze, G. (2023). Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023. (pp. 5481-5485). International Speech Communication Association.

[24]

Lameris, H., Gustafsson, J., Székely, É. (2023). Beyond style : synthesizing speech with pragmatic functions. In Interspeech 2023. (pp. 3382-3386). International Speech Communication Association.

[25]

Gustafsson, J., Székely, É., Alexanderson, S., Beskow, J. (2023). Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition, FG 2023. Institute of Electrical and Electronics Engineers (IEEE).

[26]

Gustafsson, J., Székely, É., Beskow, J. (2023). Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters. In 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023). Institute of Electrical and Electronics Engineers (IEEE).

[27]

Miniotaitė, J., Wang, S., Beskow, J., Gustafson, J., Székely, É., Abelho Pereira, A. T. (2023). Hi robot, it's not what you say, it's how you say it. In 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN. (pp. 307-314). Institute of Electrical and Electronics Engineers (IEEE).

[28]

Lameris, H., Wlodarczak, M., Gustafsson, J., Székely, É. (2023). Neural speech synthesis with controllable creaky voice style. In Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023. (pp. 3141-3145).

[29]

Kirkland, A., Gustafsson, J., Székely, É. (2023). Pardon my disfluency : The impact of disfluency effects on the perception of speaker competence and confidence. In Interspeech 2023. (pp. 5217-5221). International Speech Communication Association.

[30]

Lameris, H., Mehta, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). Prosody-Controllable Spontaneous TTS with Neural HMMs. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Institute of Electrical and Electronics Engineers (IEEE).

[31]

Székely, É., Gustafsson, J., Torre, I. (2023). Prosody-controllable gender-ambiguous speech synthesis : a tool for investigating implicit bias in speech perception. In Interspeech 2023. (pp. 1234-1238). International Speech Communication Association.

[32]

Lameris, H., Kirkland, A., Gustafsson, J., Székely, É. (2023). Situating speech synthesis : Investigating contextual factors in the evaluation of conversational TTS. In Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France, August 26–28, 2023. (pp. 69-74). International Speech Communication Association.

[33]

Székely, É., Wang, S., Gustafsson, J. (2023). So-to-Speak : an exploratory platform for investigating the interplay between style and prosody in TTS. In Interspeech 2023. (pp. 2016-2017). International Speech Communication Association.

[34]

Wang, S., Gustafsson, J., Székely, É. (2022). Evaluating Sampling-based Filler Insertion with Spontaneous TTS. In LREC 2022: Thirteen International Conference On Language Resources And Evaluation. (pp. 1960-1969). European Language Resources Association (ELRA).

[35]

Moell, B., O'Regan, J., Mehta, S., Kirkland, A., Lameris, H., Gustafsson, J., Beskow, J. (2022). Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge. In The RaPID4 Workshop: Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments. (pp. 62-70). Marseille, France.

[36]

Kirkland, A., Lameris, H., Székely, É., Gustafsson, J. (2022). Where's the uh, hesitation? : The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence. In INTERSPEECH 2022. (pp. 4990-4994). International Speech Communication Association.

[37]

Kontogiorgos, D., Tran, M., Gustafsson, J., Soleymani, M. (2021). A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational Failures. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction. (pp. 112-120). Association for Computing Machinery (ACM).

[38]

Wang, S., Alexanderson, S., Gustafsson, J., Beskow, J., Henter, G. E., Székely, É. (2021). Integrated Speech and Gesture Synthesis. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction. (pp. 177-185). Association for Computing Machinery (ACM).

[39]

Kirkland, A., Włodarczak, M., Gustafsson, J., Székely, É. (2021). Perception of smiling voice in spontaneous speech synthesis. In Proceedings of Speech Synthesis Workshop (SSW11). (pp. 108-112). International Speech Communication Association.

[40]

Székely, É., Edlund, J., Gustafsson, J. (2020). Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis. In Proceedings of The 12th Language Resources and Evaluation Conference. (pp. 6368-6374). European Language Resources Association.

[41]

Kontogiorgos, D., Abelho Pereira, A. T., Sahindal, B., van Waveren, S., Gustafson, J. (2020). Behavioural Responses to Robot Conversational Failures. In HRI '20: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. ACM Digital Library.

[42]

Székely, É., Henter, G. E., Beskow, J., Gustafsson, J. (2020). Breathing and Speech Planning in Spontaneous Speech Synthesis. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 7649-7653). IEEE.

[43]

Kontogiorgos, D., Sibirtseva, E., Gustafson, J. (2020). Chinese whispers : A multimodal dataset for embodied language grounding. In LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. (pp. 743-749). European Language Resources Association (ELRA).

[44]

Abelho Pereira, A. T., Oertel, C., Fermoselle, L., Mendelson, J., Gustafson, J. (2020). Effects of Different Interaction Contexts when Evaluating Gaze Models in HRI. Presented at ACM/IEEE International Conference on Human-Robot Interaction (HRI), MAR 23-26, 2020, Cambridge, ENGLAND. (pp. 131-138). Association for Computing Machinery (ACM).

[45]

Kontogiorgos, D., van Waveren, S., Wallberg, O., Abelho Pereira, A. T., Leite, I., Gustafson, J. (2020). Embodiment Effects in Interactions with Failing Robots. In CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM Digital Library.

[46]

Håkansson, K., Beskow, J., Kjellström, H., Gustafsson, J., Bonnard, A., Rydén, M., Stormoen, S., Hagman, G., Akenine, U., Peres, K. M., Henter, G. E., Kivipelto, M. (2020). Robot-assisted detection of subclinical dementia : progress report and preliminary findings. In In 2020 Alzheimer's Association International Conference. ALZ...

[47]

Székely, É., Henter, G. E., Gustafson, J. (2019). Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 6925-6929). IEEE.

[48]

Kontogiorgos, D., Abelho Pereira, A. T., Gustafson, J. (2019). Estimating Uncertainty in Task Oriented Dialogue. In ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction. (pp. 414-418). ACM Digital Library.

[49]

Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). How to train your fillers: uh and um in spontaneous speech synthesis. Presented at The 10th ISCA Speech Synthesis Workshop.

[50]

Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Off the cuff : Exploring extemporaneous speech delivery with TTS. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 3687-3688). International Speech Communication Association.

[51]

Malisz, Z., Berthelsen, H., Beskow, J., Gustafson, J. (2019). PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network. In Proceedings of SSW 10 - The 10th ISCA Speech Synthesis Workshop. Vienna.

[52]

Abelho Pereira, A. T., Oertel, C., Fermoselle, L., Mendelson, J., Gustafson, J. (2019). Responsive Joint Attention in Human-Robot Interaction. In Proceedings 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019. (pp. 1080-1087). Institute of Electrical and Electronics Engineers (IEEE).

[53]

Wagner, P., Beskow, J., Betz, S., Edlund, J., Gustafson, J., Henter, G. E., Le Maguer, S., Malisz, Z., Székely, É., Tånnander, C. (2019). Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program. In Proceedings of the 10th Speech Synthesis Workshop (SSW10)..

[54]

Székely, É., Henter, G. E., Beskow, J., Gustafson, J. (2019). Spontaneous conversational speech synthesis from found data. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 4435-4439). ISCA.

[55]

Tånnander, C., Fallgren, P., Edlund, J., Gustafson, J. (2019). Spot the pleasant people! Navigating the cocktail party buzz. In Proceedings Interspeech 2019, 20th Annual Conference of the International Speech Communication Association. (pp. 4220-4224).

[56]

Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T., Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci)..

[57]

Kontogiorgos, D., Abelho Pereira, A. T., Gustafson, J. (2019). The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots. In The Challenges of Working on Social Robots that Collaborate with People..

[58]

Kontogiorgos, D., Abelho Pereira, A. T., Andersson, O., Koivisto, M., Gonzalez Rabal, E., Vartiainen, V., Gustafson, J. (2019). The effects of anthropomorphism and non-verbal social behaviour in virtual assistants. In IVA 2019 - Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. (pp. 133-140). Association for Computing Machinery (ACM).

[59]

Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J., Gustafson, J. (2019). The speech synthesis phoneticians need is both realistic and controllable. In Proceedings from FONETIK 2019. Stockholm.

[60]

Sibirtseva, E., Kontogiorgos, D., Nykvist, O., Karaoǧuz, H., Leite, I., Gustafson, J., Kragic, D. (2018). A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction. In Proceedings 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 2018. IEEE.

[61]

Kontogiorgos, D., Avramova, V., Alexanderson, S., Jonell, P., Oertel, C., Beskow, J., Skantze, G., Gustafson, J. (2018). A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). (pp. 119-127). Paris.

[62]

Jonell, P., Oertel, C., Kontogiorgos, D., Beskow, J., Gustafson, J. (2018). Crowdsourced Multimodal Corpora Collection Tool. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). (pp. 728-734). Paris.

[63]

Kragic, D., Gustafson, J., Karaoǧuz, H., Jensfelt, P., Krug, R. (2018). Interactive, collaborative robots : Challenges and opportunities. In IJCAI International Joint Conference on Artificial Intelligence. (pp. 18-25). International Joint Conferences on Artificial Intelligence.

[64]

Kontogiorgos, D., Sibirtseva, E., Pereira, A., Skantze, G., Gustafson, J. (2018). Multimodal reference resolution in collaborative assembly tasks. In Multimodal reference resolution in collaborative assembly tasks. ACM Digital Library.

[65]

Székely, É., Wagner, P., Gustafson, J. (2018). THE WRYLIE-BOARD: MAPPING ACOUSTIC SPACE OF EXPRESSIVE FEEDBACK TO ATTITUDE MARKERS. In Proc. IEEE Spoken Language Technology conference..

[66]

Malisz, Z., Berthelsen, H., Beskow, J., Gustafson, J. (2017). Controlling prominence realisation in parametric DNN-based speech synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017. (pp. 1079-1083). International Speech Communication Association.

[67]

Oertel, C., Jonell, P., Kontogiorgos, D., Mendelson, J., Beskow, J., Gustafson, J. (2017). Crowd-Sourced Design of Artificial Attentive Listeners. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 854-858). International Speech Communication Association.

[68]

Jonell, P., Oertel, C., Kontogiorgos, D., Beskow, J., Gustafson, J. (2017). Crowd-powered design of virtual attentive listeners. In 17th International Conference on Intelligent Virtual Agents, IVA 2017. (pp. 188-191). Springer.

[69]

Oertel, C., Jonell, P., Kontogiorgos, D., Mendelson, J., Beskow, J., Gustafson, J. (2017). Crowdsourced design of artificial attentive listeners. Presented at INTERSPEECH: Situated Interaction, Augusti 20-24 Augusti, 2017.

[70]

Heldner, M., Gustafson, J., Strömbergsson, S. (2017). Message from the technical program chairs. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech Communication Association.

[71]

Szekely, E., Mendelson, J., Gustafson, J. (2017). Synthesising uncertainty : The interplay of vocal effort and hesitation disfluencies. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. 804-808). International Speech Communication Association.

[72]

Oertel, C., Jonell, P., Haddad, K. E., Szekely, E., Gustafson, J. (2017). Using crowd-sourcing for the design of listening agents : Challenges and opportunities. In ISIAA 2017 - Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, Co-located with ICMI 2017. (pp. 37-38). Association for Computing Machinery (ACM).

[73]

Edlund, J., Gustafson, J. (2016). Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. (pp. 4531-4534). European Language Resources Association (ELRA).

[74]

Johansson, M., Hori, T., Skantze, G., Hothker, A., Gustafson, J. (2016). Making Turn-Taking Decisions for an Active Listening Robot for Memory Training. In SOCIAL ROBOTICS, (ICSR 2016). (pp. 940-949). Springer.

[75]

Oertel, C., Gustafson, J., Black, A. (2016). On Data Driven Parametric Backchannel Synthesis for Expressing Attentiveness in Conversational Agents. In Proceedings of Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction (MA3HMI), satellite workshop of ICMI 2016..

[76]

Oertel, C., David Lopes, J., Yu, Y., Funes, K., Gustafson, J., Black, A., Odobez, J.-M. (2016). Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Audio-Visual Feedback Tokens. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI 2016). Tokyo, Japan.

[77]

Oertel, C., Gustafson, J., Black, A. (2016). Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances. In Proceedings of Interspeech 2016. San Fransisco, USA.

[78]

Edlund, J., Tånnander, C., Gustafson, J. (2015). Audience response system-based assessment for analysis-by-synthesis. In Proc. of ICPhS 2015. ICPhS.

[79]

Meena, R., David Lopes, J., Skantze, G., Gustafson, J. (2015). Automatic Detection of Miscommunication in Spoken Dialogue Systems. In Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). (pp. 354-363).

[80]

Oertel, C., Funes, K., Gustafson, J., Odobez, J.-M. (2015). Deciphering the Silent Participant : On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions. In Proccedings of ICMI 2015. ACM Digital Library.

[81]

Lopes, J., Salvi, G., Skantze, G., Abad, A., Gustafson, J., Batista, F., Meena, R., Trancoso, I. (2015). Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances. In INTERSPEECH-2015. (pp. 1805-1809).

[82]

Bollepalli, B., Urbain, J., Raitio, T., Gustafson, J., Cakmak, H. (2014). A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS. Presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 04-09, 2014, Florence, ITALY. (pp. 255-259).

[83]

Johansson, M., Skantze, G., Gustafson, J. (2014). Comparison of human-human and human-robot Turn-taking Behaviour in multi-party Situated interaction. In UM3I '14: Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal Interactions. (pp. 21-26). Istanbul, Turkey.

[84]

Meena, R., Boye, J., Skantze, G., Gustafson, J. (2014). Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System. In Proceedings of the SIGDIAL 2014 Conference. (pp. 2-11). Association for Computational Linguistics.

[85]

Edlund, J., Edelstam, F., Gustafson, J. (2014). Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems. In Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM). (pp. 73-77). Gothenburg, Sweden.

[86]

Al Moubayed, S., Beskow, J., Bollepalli, B., Gustafson, J., Hussen-Abdelaziz, A., Johansson, M., Koutsombogera, M., Lopes, J. D., Novikova, J., Oertel, C., Skantze, G., Stefanov, K., Varol, G. (2014). Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue. Presented at 9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany. IEEE conference proceedings.

[87]

Dalmas, T., Götze, J., Gustafsson, J., Janarthanam, S., Kleindienst, J., Mueller, C., Stent, A., Vlachos, A., Artzi, Y., Benotti, L., Boye, J., Clark, S., Curin, J., Dethlefs, N., Edlund, J., Goldwasser, D., Heeman, P., Jurcicek, F., Kelleher, J., Komatani, K., Kwiatkowski, T., Larsson, S., Lemon, O., Lenke, N., Macek, J., Macek, T., Mooney, R., Ramachandran, D., Rieser, V., Shi, H., Tenbrink, T., Williams, J. (2014). Introduction. In Proceedings 2014 Workshop on Dialogue in Motion, DM 2014. Association for Computational Linguistics (ACL).

[88]

Meena, R., Boye, J., Skantze, G., Gustafson, J. (2014). Using a Spoken Dialogue System for Crowdsourcing Street-level Geographic Information. Presented at 2nd Workshop on Action, Perception and Language, SLTC 2014.

[89]

Oertel, C., Funes, K., Sheiki, S., Odobez, J.-M., Gustafson, J. (2014). Who will get the grant? : A multimodal corpus for the analysis of conversational behaviours in group interviews. In UM3I 2014 - Proceedings of the 2014 ACM Workshop on Understanding and Modeling Multiparty, Multimodal Interactions, Co-located with ICMI 2014. (pp. 27-32). Association for Computing Machinery (ACM).

[90]

Meena, R., Skantze, G., Gustafson, J. (2013). A Data-driven Model for Timing Feedback in a Map Task Dialogue System. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial. (pp. 375-383). Metz, France.

[91]

Al Moubayed, S., Edlund, J., Gustafson, J. (2013). Analysis of gaze and speech patterns in three-party quiz game interaction. In Interspeech 2013. (pp. 1126-1130). The International Speech Communication Association (ISCA).

[92]

Johansson, M., Skantze, G., Gustafson, J. (2013). Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions. In Social Robotics: 5th International Conference, ICSR 2013, Bristol, UK, October 27-29, 2013, Proceedings. (pp. 351-360). Springer.

[93]

Meena, R., Skantze, G., Gustafson, J. (2013). Human Evaluation of Conceptual Route Graphs for Interpreting Spoken Route Descriptions. In Proceedings of the 3rd International Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI). (pp. 30-35). Potsdam, Germany.

[94]

Bollepalli, B., Beskow, J., Gustafsson, J. (2013). Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks. In Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings. (pp. 97-103). Springer Berlin/Heidelberg.

[95]

Edlund, J., Al Moubayed, S., Tånnander, C., Gustafson, J. (2013). Temporal precision and reliability of audience response system based annotation. In Proc. of Multimodal Corpora 2013..

[96]

Oertel, C., Salvi, G., Götze, J., Edlund, J., Gustafson, J., Heldner, M. (2013). The KTH Games Corpora : How to Catch a Werewolf. In IVA 2013 Workshop Multimodal Corpora: Beyond Audio and Video: MMC 2013..

[97]

Meena, R., Skantze, G., Gustafson, J. (2013). The Map Task Dialogue System : A Test-bed for Modelling Human-Like Dialogue. In 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial. (pp. 366-368). Metz, France.

[98]

Meena, R., Skantze, G., Gustafson, J. (2012). A Chunking Parser for Semantic Interpretation of Spoken Route Directions in Human-Robot Dialogue. In Proceedings of the 4th Swedish Language Technology Conference (SLTC 2012). (pp. 55-56). Lund, Sweden.

[99]

Meena, R., Skantze, G., Gustafson, J. (2012). A data-driven approach to understanding spoken route directions in human-robot dialogue. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. (pp. 226-229).

[100]

Blomberg, M., Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B. (2012). Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis. In Proceedings of WOCCI. Portland, OR: The International Society for Computers and Their Applications (ISCA).

[101]

Neiberg, D., Gustafson, J. (2012). Cues to perceived functions of acted and spontaneous feedback expressions. In Proceedings of theInterdisciplinary Workshop on Feedback Behaviors in Dialog. (pp. 53-56).

[102]

Neiberg, D., Gustafson, J. (2012). Exploring the implications for feedback of a neurocognitive theory of overlapped speech. In Proceedings of Workshop on Feedback Behaviors in Dialog. (pp. 57-60).

[103]

Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B. (2012). Furhat at Robotville : A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In Proceedings of the Workshop on Real-time Conversation with Virtual Agents IVA-RCVA..

[104]

Al Moubayed, S., Beskow, J., Granström, B., Gustafson, J., Mirning, N., Skantze, G., Tscheligi, M. (2012). Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space. In Proc of LREC Workshop on Multimodal Corpora. Istanbul, Turkey.

[105]

Oertel, C., Wlodarczak, M., Edlund, J., Wagner, P., Gustafson, J. (2012). Gaze Patterns in Turn-Taking. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 3. (pp. 2243-2246). Portland, Oregon, US.

[106]

Bollepalli, B., Beskow, J., Gustafson, J. (2012). HMM based speech synthesis system for Swedish Language. In The Fourth Swedish Language Technology Conference. Lund, Sweden.

[107]

Edlund, J., Oertel, C., Gustafson, J. (2012). Investigating negotiation for load-time in the GetHomeSafe project. In Proc. of Workshop on Innovation and Applications in Speech Technology (IAST). (pp. 45-48). Dublin, Ireland.

[108]

Al Moubayed, S., Skantze, G., Beskow, J., Stefanov, K., Gustafson, J. (2012). Multimodal Multiparty Social Interaction with the Furhat Head. Presented at 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA. (pp. 293-294). Association for Computing Machinery (ACM).

[109]

Edlund, J., Heldner, M., Gustafson, J. (2012). On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 2. (pp. 1482-1485).

[110]

Boye, J., Fredriksson, M., Götze, J., Gustafson, J., Königsmann, J. (2012). Walk this way : Spatial grounding for city exploration. In IWSDS..

[111]

Edlund, J., Heldner, M., Gustafson, J. (2012). Who am I speaking at? : perceiving the head orientation of speakers from acoustic cues alone. In Proc. of LREC Workshop on Multimodal Corpora 2012. Istanbul, Turkey.

[112]

Neiberg, D., Gustafson, J. (2011). A Dual Channel Coupled Decoder for Fillers and Feedback. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. (pp. 3097-3100).

[113]

Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafsson, J., Carlson, R., Kragic, D., Rasolzadeh, B. (2011). Enhanced Visual Scene Understanding through Human-Robot Dialog. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. (pp. 3342-3348). IEEE.

[114]

Neiberg, D., Gustafson, J. (2011). Predicting Speaker Changes and Listener Responses With And Without Eye-contact. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. (pp. 1576-1579). Florence, Italy.

[115]

Neiberg, D., Ananthakrishnan, G., Gustafson, J. (2011). Tracking pitch contours using minimum jerk trajectories. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. (pp. 2056-2059).

[116]

Johansson, M., Skantze, G., Gustafson, J. (2011). Understanding route directions in human-robot dialogue. In Proceedings of SemDial. (pp. 19-27). Los Angeles, CA.

[117]

Gustafson, J., Neiberg, D. (2010). Directing conversation using the prosody of mm and mhm. In Proceedings of SLTC 2010. (pp. 15-16). Linköping, Sweden.

[118]

Johnson-Roberson, M., Bohg, J., Kragic, D., Skantze, G., Gustafson, J., Carlson, R. (2010). Enhanced visual scene understanding through human-robot dialog. In Dialog with Robots: AAAI 2010 Fall Symposium..

[119]

Beskow, J., Edlund, J., Granström, B., Gustafsson, J., House, D. (2010). Face-to-Face Interaction and the KTH Cooking Show. In Development of multimodal interfaces: Active listing and synchrony. (pp. 157-168).

[120]

Neiberg, D., Gustafson, J. (2010). Modeling Conversational Interaction Using Coupled Markov Chains. In Proceedings of DiSS-LPSS Joint Workshop 2010..

[121]

Gustafson, J., Neiberg, D. (2010). Prosodic cues to engagement in non-lexical response tokens in Swedish. In Proceedings of DiSS-LPSS Joint Workshop 2010. Tokyo, Japan.

[122]

Schötz, S., Beskow, J., Bruce, G., Granström, B., Gustafson, J. (2010). Simulating Intonation in Regional Varieties of Swedish. In Speech Prosody 2010. Chicago, USA.

[123]

Neiberg, D., Gustafson, J. (2010). The Prosody of Swedish Conversational Grunts. In 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010. (pp. 2562-2565).

[124]

Skantze, G., Gustafson, J. (2009). Attention and interaction control in a human-human-computer dialogue setting. In Proceedings of SIGDIAL 2009: the 10th Annual Meeting of the Special Interest Group in Discourse and Dialogue. (pp. 310-313).

[125]

Gustafson, J., Merkes, M. (2009). Eliciting interactional phenomena in human-human dialogues. In Proceedings of the SIGDIAL 2009 Conference: 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue. (pp. 298-301).

[126]

Skantze, G., Gustafson, J. (2009). Multimodal interaction control in the MonAMI Reminder. In Proceedings of DiaHolmia: 2009 Workshop on the Semantics and Pragmatics of Dialogue. (pp. 127-128).

[127]

Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G., Tobiasson, H. (2009). The MonAMI Reminder : a spoken dialogue system for face-to-face interaction. In Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009. (pp. 300-303). Brighton, U.K.

[128]

Gustafson, J., Edlund, J. (2008). EXPROS : A toolkit for exploratory experimentation with prosody in customized diphone voices. In Perception In Multimodal Dialogue Systems, Proceedings. (pp. 293-296).

[129]

Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G. (2008). Innovative interfaces in MonAMI : The Reminder. In Perception In Multimodal Dialogue Systems, Proceedings. (pp. 272-275).

[130]

Gustafson, J., Heldner, M., Edlund, J. (2008). Potential benefits of human-like dialogue behaviour in the call routing domain. In Perception In Multimodal Dialogue Systems, Proceedings. (pp. 240-251).

[131]

Strangert, E., Gustafson, J. (2008). Subject ratings, acoustic measurements and synthesis of good-speaker characteristics. In Proceedings of Interspeech 2008. (pp. 1688-1691).

[132]

Strangert, E., Gustafson, J. (2008). What makes a good speaker? : Subject ratings, acoustic measurements and perceptual evaluations. In Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. (pp. 1688-1691).

[133]

Bell, L., Gustafson, J. (2007). Children’s convergence in referring expressions to graphical objects in a speech-enabled computer game. In 8th Annual Conference of the International Speech Communication Association. (pp. 2788-2791). Antwerp, Belgium.

[134]

Edlund, J., Heldner, M., Gustafson, J. (2006). Two faces of spoken dialogue systems. In Interspeech 2006 - ICSLP Satellite Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems. Pittsburgh PA, USA.

[135]

Boye, J., Gustafson, J. (2005). How to do dialogue in a fairy-tale world. In Proceedings of the 6th SIGDial workshop on discourse and dialogue..

[136]

Gustafson, J., Boye, J., Fredriksson, M., Johannesson, L., Königsmann, J. (2005). Providing computer game characters with conversational abilities. In INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS. (pp. 37-51). Kos, Greece.

[137]

Bell, L., Boye, J., Gustafson, J., Heldner, M., Lindström, A., Wirén, M. (2005). The Swedish NICE Corpus : Spoken dialogues between children and embodied characters in a computer game scenario. In 9th European Conference on Speech Communication and Technology. (pp. 2765-2768). Lisbon, Portugal.

[138]

Boye, J., Wirén, M., Gustafson, J. (2004). Contextual reasoning in multimodal dialogue systems : two case studies. In Proceedings of The 8th Workshop on the Semantics and Pragmatics of Dialogue Catalogue'04. (pp. 19-21). Barcelona.

[139]

Gustafson, J., Bell, L., Boye, J., Lindström, A., Wirén, M. (2004). The NICE fairy-tale game system. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004. Boston.

[140]

Gustafson, J., Sjölander, K. (2004). Voice creations for conversational fairy-tale characters. In Proc 5th ISCA speech synthesis workshop. (pp. 145-150). Pittsburgh.

[141]

Gustafson, J., Bell, L., Johan, B., Edlund, J., Wirn, M. (2002). Constraint Manipulation and Visualization in a Multimodal Dialogue System. In Proceedings of MultiModal Dialogue in Mobile Environments..

[142]

Gustafson, J., Sjölander, K. (2002). Voice Transformations For Improving Children's Speech Recognition In A Publicly Available Dialogue System. In Proceedings of ICSLP 02. (pp. 297-300). International Speech Communication Association.

[143]

Bell, L., Boye, J., Gustafson, J. (2001). Real-time Handling of Fragmented Utterances. In Proceedings of the NAACL Workshop on Adaption in Dialogue Systems..

[144]

Bell, L., Eklund, R., Gustafson, J. (2000). A Comparison of Disfluency Distribution in a Unimodal and a Multimodal Speech Interface. In Proceedings of ICSLP 00..

[145]

Bell, L., Boye, J., Gustafson, J., Wirén, M. (2000). Modality Convergence in a Multimodal Dialogue System. In Proceedings of Götalog. (pp. 29-34).

[146]

Bell, L., Gustafson, J. (2000). Positive and Negative User Feedback in a Spoken Dialogue Corpus. In Proceedings of ICSLP 00..

[147]

Bell, L., Gustafson, J. (1999). Repetition and its phonetic realizations : investigating a Swedish database of spontaneous computer directed speech. In Proceedings of the XIVth International Congress of Phonetic Sciences. (p. 1221).

[148]

Gustafson, J., Larsson, A., Carlson, R., Hellman, K. (1997). How do System Questions Influence Lexical Choices in User Answers?. In Proceedings of Eurospeech '97, 5th European Conference on Speech Communication and Technology : Rhodes, Greece, 22 - 25 September 1997. (pp. 2275-2278). Grenoble: European Speech Communication Association (ESCA).

Chapters in books

[149]

Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. In Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Ed.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press.

[150]

Boye, J., Fredriksson, M., Götze, J., Gustafson, J. & Königsmann, J. (2014). Walk this way : Spatial grounding for city exploration. In Natural interaction with robots, knowbots and smartphones (pp. 59-67). Springer-Verlag.

[151]

Edlund, J. & Gustafson, J. (2010). Ask the experts : Part II: Analysis. In Juel Henrichsen, Peter (Ed.), Linguistic Theory and Raw Sound (pp. 183-198). Frederiksberg: Samfundslitteratur.

[152]

Gustafson, J. & Edlund, J. (2010). Ask the experts - Part I: Elicitation. In Juel Henrichsen, Peter (Ed.), Linguistic Theory and Raw Sound (pp. 169-182). Samfundslitteratur.

[153]

Edlund, J., Heldner, M. & Gustafson, J. (2005). Utterance segmentation and turn-taking in spoken dialogue systems. In Fisseni, B.; Schmitz, H-C.; Schröder, B.; Wagner, P. (Ed.), Computer Studies in Language and Speech (pp. 576-587). Frankfurt am Main, Germany: Peter Lang.

Non-peer reviewed

Conference papers

[154]

Lameris, H., Mehta, S., Henter, G. E., Kirkland, A., Moëll, B., O'Regan, J., Gustafsson, J., Székely, É. (2022). Spontaneous Neural HMM TTS with Prosodic Feature Modification. In Proceedings of Fonetik 2022..

[155]

Edlund, J., Al Moubayed, S., Tånnander, C., Gustafson, J. (2013). Audience response system based annotation of speech. In Proceedings of Fonetik 2013. (pp. 13-16). Linköping: Linköping University.

[156]

Al Moubayed, S., Beskow, J., Blomberg, M., Granström, B., Gustafson, J., Mirning, N., Skantze, G. (2012). Talking with Furhat - multi-party interaction with a back-projected robot head. In Proceedings of Fonetik 2012. (pp. 109-112). Gothenberg, Sweden.

[157]

Neiberg, D., Gustafson, J. (2012). Towards letting machines humming in the right way : prosodic analysis of six functions of short feedback tokens in English. In Proceedings of Fonetik..

[158]

Edlund, J., Gustafson, J., Beskow, J. (2010). Cocktail : a demonstration of massively multi-component audio environments for illustration and analysis. In SLTC 2010, The Third Swedish Language Technology Conference (SLTC 2010): Proceedings of the Conference..

[159]

Beskow, J., Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A., House, D. (2010). Modelling humanlike conversational behaviour. In SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference. (pp. 9-10). Linköping, Sweden.

[160]

Neiberg, D., Gustafson, J. (2010). Prosodic Characterization and Automatic Classification of Conversational Grunts in Swedish. In Working Papers 54: Proceedings from Fonetik 2010..

[161]

Beskow, J., Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A., House, D. (2010). Research focus : Interactional aspects of spoken face-to-face communication. In Proceedings from Fonetik, Lund, June 2-4, 2010: . (pp. 7-10). Lund, Sweden: Lund University.

[162]

Schötz, S., Beskow, J., Bruce, G., Granström, B., Gustafson, J., Segerup, M. (2010). Simulating Intonation in Regional Varieties of Swedish. In Fonetik 2010. Lund, Sweden.

[163]

Beskow, J., Gustafson, J. (2009). Experiments with Synthesis of Swedish Dialects. In Proceedings of Fonetik 2009. (pp. 28-29). Stockholm: Stockholm University.

[164]

Gustafson, J., Edlund, J. (2008). EXPROS : Tools for exploratory experimentation with prosody. In Proceedings of FONETIK 2008. (pp. 17-20). Gothenburg, Sweden.

[165]

Strangert, E., Gustafson, J. (2008). Improving speaker skill in a resynthesis experiment. In Proceedings FONETIK 2008: The XXIst Swedish Phonetics Conference. (pp. 69-72).

[166]

Beskow, J., Edlund, J., Granström, B., Gustafson, J., Jonsson, O., Skantze, G. (2008). Speech technology in the European project MonAMI. In Proceedings of FONETIK 2008. (pp. 33-36). Gothenburg, Sweden: University of Gothenburg.

Chapters in books

[167]

Bertenstam, J., Mats, B., Carlson, R., Elenius, K., Granström, B., Gustafson, J. ... Ström, N. (1995). Spoken dialogue data collected in the Waxholm project. In Quarterly progress and status report: April 15, 1995 /Speech Transmission Laboratory ( (1 ed.) pp. 50-73). Stockholm: KTH.

Theses

[168]

Gustafson, J. (2002). Developing Multimodal Spoken Dialogue Systems : Empirical Studies of Spoken Human–Computer Interaction (Doctoral thesis , KTH, Stockholm, Trita-TMH 2002:8). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3460.

Other

[169]

Wang, S., Henter, G. E., Gustafsson, J., Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. (Manuscript).

[170]

Jonell, P., Mendelson, J., Storskog, T., Hagman, G., Östberg, P., Leite, I. ... Kjellström, H. (2017). Machine Learning and Social Robotics for Detecting Early Signs of Dementia..

Latest sync with DiVA:

2025-12-21 00:08:06 UTC

Studies

Research

Collaboration

About KTH

Library

Publications by Joakim Gustafsson

Peer reviewed

Articles

Conference papers

Chapters in books

Non-peer reviewed

Conference papers

Chapters in books

Theses

Other

Contact