Publikationer av Jonas Beskow

Refereegranskade

Artiklar

[1]

B. Moell, F. Farestam och J. Beskow, "Swedish Medical LLM Benchmark : Development and evaluation of a framework for assessing large language models in the Swedish medical domain," Frontiers in Artificial Intelligence, vol. 8, 2025.

[2]

A. Deichler et al., "Learning to generate pointing gestures in situated embodied conversational agents," Frontiers in Robotics and AI, vol. 10, 2023.

[3]

S. Alexanderson et al., "Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models," ACM Transactions on Graphics, vol. 42, no. 4, 2023.

[4]

M. Cohn et al., "Vocal accommodation to technology: the role of physical form," Language Sciences, vol. 99, 2023.

[5]

P. Jonell et al., "Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia : Clinical Feasibility and Preliminary Results," Frontiers in Computer Science, vol. 3, 2021.

[6]

G. Valle-Perez et al., "Transflower : probabilistic autoregressive dance generation with multimodal attention," ACM Transactions on Graphics, vol. 40, no. 6, 2021.

[7]

G. E. Henter, S. Alexanderson och J. Beskow, "MoGlow : Probabilistic and controllable motion synthesis using normalising flows," ACM Transactions on Graphics, vol. 39, no. 6, s. 1-14, 2020.

[8]

K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 250-259, 2020.

[9]

S. Alexanderson et al., "Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows," Computer graphics forum (Print), vol. 39, no. 2, s. 487-496, 2020.

[10]

K. Stefanov et al., "Modeling of Human Visual Attention in Multiparty Open-World Dialogues," ACM Transactions on Human-Robot Interaction, vol. 8, no. 2, 2019.

[11]

S. Alexanderson et al., "Mimebot—Investigating the Expressibility of Non-Verbal Communication Across Agent Embodiments," ACM Transactions on Applied Perception, vol. 14, no. 4, 2017.

[12]

S. Alexanderson, C. O'Sullivan och J. Beskow, "Real-time labeling of non-rigid motion capture marker sets," Computers & graphics, vol. 69, no. Supplement C, s. 59-67, 2017.

[13]

S. Alexanderson och J. Beskow, "Towards Fully Automated Motion Capture of Signs -- Development and Evaluation of a Key Word Signing Avatar," ACM Transactions on Accessible Computing, vol. 7, no. 2, s. 7:1-7:17, 2015.

[14]

S. Alexanderson och J. Beskow, "Animated Lombard speech : Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions," Computer speech & language (Print), vol. 28, no. 2, s. 607-618, 2014.

[15]

N. Mirnig et al., "Face-To-Face With A Robot : What do we actually talk about?," International Journal of Humanoid Robotics, vol. 10, no. 1, s. 1350011, 2013.

[16]

S. Al Moubayed, G. Skantze och J. Beskow, "The Furhat Back-Projected Humanoid Head-Lip Reading, Gaze And Multi-Party Interaction," International Journal of Humanoid Robotics, vol. 10, no. 1, s. 1350005, 2013.

[17]

S. Al Moubayed, J. Edlund och J. Beskow, "Taming Mona Lisa : communicating gaze faithfully in 2D and 3D facial projections," ACM Transactions on Interactive Intelligent Systems, vol. 1, no. 2, s. 25, 2012.

[18]

S. Al Moubayed, J. Beskow och B. Granström, "Auditory visual prominence From intelligibility to behavior," Journal on Multimodal User Interfaces, vol. 3, no. 4, s. 299-309, 2009.

[19]

J. Edlund och J. Beskow, "MushyPeek : A Framework for Online Investigation of Audiovisual Dialogue Phenomena," Language and Speech, vol. 52, s. 351-367, 2009.

[20]

G. Salvi et al., "SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support," Eurasip Journal on Audio, Speech, and Music Processing, vol. 2009, s. 191940, 2009.

[21]

J. Beskow et al., "Visualization of speech and audio for hearing-impaired persons," Technology and Disability, vol. 20, no. 2, s. 97-107, 2008.

[22]

B. Lidestam och J. Beskow, "Motivation and appraisal in perception of poorly specified speech," Scandinavian Journal of Psychology, vol. 47, no. 2, s. 93-101, 2006.

[23]

B. Lidestam och J. Beskow, "Visual phonemic ambiguity and speechreading," Journal of Speech, Language and Hearing Research, vol. 49, no. 4, s. 835-847, 2006.

[24]

J. Beskow, "Trainable articulatory control models for visual speech synthesis," International Journal of Speech Technology, vol. 7, no. 4, s. 335-349, 2004.

Konferensbidrag

[25]

L. Marcinek, J. Beskow och J. Gustafsson, "A Dual-Control Dialogue Framework for Human-Robot Interaction Data Collection : Integrating Human Emotional and Contextual Awareness with Conversational AI," i Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings, 2025, s. 290-297.

[26]

C. Tånnander et al., "Intrasentential English in Swedish TTS : perceived English-accentedness," i Interspeech 2025, 2025, s. 1638-1642.

[27]

L. Marcinek, J. Beskow och J. Gustafsson, "Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments," i Interspeech 2025, 2025, s. 2165-2169.

[28]

L. Marcinek, J. Beskow och J. Gustafsson, "A dual-control dialogue framework for human-robot interaction data collection : integrating human emotional and contextual awareness with conversational AI," i International Conference of Social Robotics (ICSR 2024), 2024.

[29]

C. Tånnander et al., "Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis," i Interspeech 2024, 2024, s. 2815-2819.

[30]

F. Malmberg et al., "Exploring Latent Sign Language Representations with Isolated Signs, Sentences and In-the-Wild Data," i 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, sign-lang@LREC-COLING 2024, 2024, s. 219-224.

[31]

S. Mehta et al., "Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis," i Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, s. 1952-1964.

[32]

S. Mehta et al., "Fake it to make it : Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis," i Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024, 2024, s. 1952-1964.

[33]

A. W. Werner, J. Beskow och A. Deichler, "Gesture Evaluation in Virtual Reality," i ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction, 2024, s. 156-164.

[34]

A. Deichler, S. Alexanderson och J. Beskow, "Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents," i Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents, IVA 2024, 2024.

[35]

S. Mehta et al., "MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING," i 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings, 2024, s. 11341-11345.

[36]

C. Tånnander et al., "Prosodic characteristics of English-accented Swedish neural TTS," i Proceedings of Speech Prosody 2024, 2024, s. 1035-1039.

[37]

S. Mehta et al., "Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech," i Interspeech 2024, 2024, s. 2285-2289.

[38]

S. Mehta et al., "Unified speech and gesture synthesis using flow matching," i 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, s. 8220-8224.

[39]

J. Gustafsson et al., "Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters," i 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition, FG 2023, 2023.

[40]

S. Mehta et al., "Diff-TTSG : Denoising probabilistic integrated speech and gesture synthesis," i Proceedings 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, 2023, s. 150-156.

[41]

A. Deichler et al., "Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation," i PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, s. 755-762.

[42]

J. Gustafsson, É. Székely och J. Beskow, "Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters," i 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023), 2023.

[43]

J. Miniotaitė et al., "Hi robot, it's not what you say, it's how you say it," i 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, s. 307-314.

[44]

S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," i Interspeech 2023, 2023, s. 4279-4283.

[45]

S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," i 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, s. 7457-7461.

[46]

B. Moell et al., "Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge," i The RaPID4 Workshop : Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments, 2022, s. 62-70.

[47]

A. Deichler et al., "Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation," i Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction, 2022, s. 2022.

[48]

J. Beskow et al., "Expressive Robot Performance based on Facial Motion Capture," i INTERSPEECH 2021, 2021, s. 2343-2344.

[49]

J. Beskow et al., "Expressive robot performance based on facial motion capture," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, s. 2165-2166.

[50]

S. Wang et al., "Integrated Speech and Gesture Synthesis," i ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 177-185.

[51]

P. Jonell et al., "Mechanical Chameleons : Evaluating the effects of a social robot’snon-verbal behavior on social influence," i Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021, 2021.

[52]

K. Chhatre et al., "Spatio-temporal priors in 3D human motion," i IEEE ICDL Workshop on Spatio-temporal Aspects of Embodied Predictive Processing, 2021.

[53]

É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," i 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, s. 7649-7653.

[54]

P. Jonell et al., "Can we trust online crowdworkers? : Comparing online and offline participants in a preference test of virtual agents.," i IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020.

[55]

M. Cohn et al., "Embodiment and gender interact in alignment to TTS voices," i Proceedings for the 42nd Annual Meeting of the Cognitive Science Society : Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020, 2020, s. 220-226.

[56]

S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," i Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.

[57]

P. Jonell et al., "Let’s face it : Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings," i IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020.

[58]

K. Håkansson et al., "Robot-assisted detection of subclinical dementia : progress report and preliminary findings," i In 2020 Alzheimer's Association International Conference. ALZ., 2020.

[59]

C. Chen et al., "Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods," i 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), 2019, s. 1-8.

[60]

É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," i The 10th ISCA Speech Synthesis Workshop, 2019.

[61]

P. Jonell et al., "Learning Non-verbal Behavior for a Social Robot from YouTube Videos," i ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019, 2019.

[62]

Z. Malisz et al., "Modern speech synthesis for phonetic sciences : A discussion and an evaluation," i Proceedings of ICPhS, 2019.

[63]

É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 3687-3688.

[64]

J. Beskow, "On Talking Heads, Social Robots and what they can Teach us," i Proceedings of ICPhS, 2019.

[65]

Z. Malisz et al., "PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network," i Proceedings of SSW 10 - The 10th ISCA Speech Synthesis Workshop, 2019.

[66]

P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," i Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.

[67]

É. Székely et al., "Spontaneous conversational speech synthesis from found data," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 4435-4439.

[68]

Z. Malisz et al., "The speech synthesis phoneticians need is both realistic and controllable," i Proceedings from FONETIK 2019, 2019.

[69]

Z. Malisz, P. Jonell och J. Beskow, "The visual prominence of whispered speech in Swedish," i Proceedings of 19th International Congress of Phonetic Sciences, 2019.

[70]

D. Kontogiorgos et al., "A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction," i Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, s. 119-127.

[71]

P. Jonell et al., "Crowdsourced Multimodal Corpora Collection Tool," i Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, s. 728-734.

[72]

H. -. Vögel et al., "Emotion-awareness for intelligent vehicle assistants : A research agenda," i Proceedings - International Conference on Software Engineering, 2018, s. 11-15.

[73]

C. Chen et al., "Reverse engineering psychologically valid facial expressions of emotion into social robots," i 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, s. 448-452.

[74]

A. E. Vijayan et al., "Using Constrained Optimization for Real-Time Synchronization of Verbal and Nonverbal Robot Behavior," i 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, s. 1955-1961.

[75]

K. Stefanov och J. Beskow, "A Real-time Gesture Recognition System for Isolated Swedish Sign Language Signs," i Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), 2017.

[76]

Z. Malisz et al., "Controlling prominence realisation in parametric DNN-based speech synthesis," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 2017, s. 1079-1083.

[77]

C. Oertel et al., "Crowd-Sourced Design of Artificial Attentive Listeners," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, s. 854-858.

[78]

P. Jonell et al., "Crowd-powered design of virtual attentive listeners," i 17th International Conference on Intelligent Virtual Agents, IVA 2017, 2017, s. 188-191.

[79]

C. Oertel et al., "Crowdsourced design of artificial attentive listeners," i INTERSPEECH: Situated Interaction, Augusti 20-24 Augusti, 2017, 2017.

[80]

Y. Zhang, J. Beskow och H. Kjellström, "Look but Don’t Stare : Mutual Gaze Interaction in Social Robots," i 9th International Conference on Social Robotics, ICSR 2017, 2017, s. 556-566.

[81]

M. S. L. Khan et al., "Moveable facial features in a social mediator," i 17th International Conference on Intelligent Virtual Agents, IVA 2017, 2017, s. 205-208.

[82]

J. Beskow et al., "Preface," i 17th International Conference on Intelligent Virtual Agents, IVA 2017, 2017, s. V-VI.

[83]

K. Stefanov, J. Beskow och G. Salvi, "Vision-based Active Speaker Detection in Multiparty Interaction," i Grounding Language Understanding, 2017.

[84]

K. Stefanov och J. Beskow, "A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction," i Proceedings of the 10th edition of the Language Resources and Evaluation Conference, 2016.

[85]

J. Beskow och H. Berthelsen, "A hybrid harmonics-and-bursts modelling approach to speech synthesis," i Proceedings 9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, s. 208-213.

[86]

S. Alexanderson, D. House och J. Beskow, "Automatic annotation of gestural units in spontaneous face-to-face interaction," i MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, s. 15-19.

[87]

K. Stefanov och J. Beskow, "Gesture Recognition System for Isolated Sign Language Signs," i The 4th European and 7th Nordic Symposium on Multimodal Communication, 29-30 September 2016, University of Copenhagen, Denmark, 2016, s. 57-59.

[88]

K. Stefanov, A. Sugimoto och J. Beskow, "Look Who’s Talking : Visual Identification of the Active Speaker in Multi-party Human-robot Interaction," i 2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction 2016, ASSP4MI 2016 - Held in conjunction with the 18th ACM International Conference on Multimodal Interaction 2016, ICMI 2016, 2016, s. 22-27.

[89]

S. Alexanderson, C. O'Sullivan och J. Beskow, "Robust online motion capture labeling of finger markers," i Proceedings - Motion in Games 2016 : 9th International Conference on Motion in Games, MIG 2016, 2016, s. 7-13.

[90]

J. Beskow, "Spoken and non-verbal interaction experiments with a social robot," i The Journal of the Acoustical Society of America, 2016.

[91]

G. Skantze, M. Johansson och J. Beskow, "A Collaborative Human-Robot Game as a Test-bed for Modelling Multi-party, Situated Interaction," i INTELLIGENT VIRTUAL AGENTS, IVA 2015, 2015, s. 348-351.

[92]

G. Skantze, M. Johansson och J. Beskow, "Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects," i Proceedings of the 2015 ACM International Conference on Multimodal Interaction, 2015.

[93]

S. Al Moubayed et al., "Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue," i 9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany, 2014.

[94]

S. Al Moubayed, J. Beskow och G. Skantze, "Spontaneous spoken dialogues with the Furhat human-like robot head," i HRI '14 Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, 2014, s. 326.

[95]

J. Beskow et al., "Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders," i 6th Biennial Conference of the International Society for Augmentative and Alternative Communication, Lisbon, Portugal, 2014.

[96]

S. Al Moubayed et al., "Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor," i 9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal, 2014.

[97]

K. Stefanov och J. Beskow, "A Kinect Corpus of Swedish Sign Language Signs," i Proceedings of the 2013 Workshop on Multimodal Corpora : Beyond Audio and Video, 2013.

[98]

S. Alexanderson, D. House och J. Beskow, "Aspects of co-occurring syllables and head nods in spontaneous dialogue," i Proceedings of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013), 2013, s. 169-172.

[99]

S. Alexanderson, D. House och J. Beskow, "Extracting and analysing co-speech head gestures from motion-capture data," i Proceedings of Fonetik 2013, 2013, s. 1-4.

[100]

S. Alexanderson, D. House och J. Beskow, "Extracting and analyzing head movements accompanying spontaneous dialogue," i Conference Proceedings TiGeR 2013 : Tilburg Gesture Research Meeting, 2013.

[101]

B. Bollepalli, J. Beskow och J. Gustafsson, "Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks," i Advances in nonlinear speech processing : 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings, 2013, s. 97-103.

[102]

S. Al Moubayed, J. Beskow och G. Skantze, "The Furhat Social Companion Talking Head," i Interspeech 2013 - Show and Tell, 2013, s. 747-749.

[103]

J. Beskow et al., "The Tivoli System - A Sign-driven Game for Children with Communicative Disorders," i 1st Symposium on Multimodal Communication, Msida, Malta, 2013.

[104]

J. Beskow och K. Stefanov, "Web-enabled 3D Talking Avatars Based on WebGL and HTML5," i 13th International Conference on Intelligent Virtual Agents, Edinburgh, UK, 2013.

[105]

J. Edlund et al., "3rd party observer gaze as a continuous measure of dialogue flow," i Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, 2012, s. 1354-1358.

[106]

S. Alexanderson och J. Beskow, "Can Anybody Read Me? Motion Capture Recordings for an Adaptable Visual Speech Synthesizer," i In proceedings of The Listening Talker, 2012, s. 52-52.

[107]

M. Blomberg et al., "Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis," i Proceedings of WOCCI, 2012.

[108]

S. Al Moubayed et al., "Furhat : A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction," i Cognitive Behavioural Systems : COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers, 2012, s. 114-130.

[109]

G. Skantze et al., "Furhat at Robotville : A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue," i Proceedings of the Workshop on Real-time Conversation with Virtual Agents IVA-RCVA, 2012.

[110]

S. Al Moubayed et al., "Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space," i Proc of LREC Workshop on Multimodal Corpora, 2012.

[111]

B. Bollepalli, J. Beskow och J. Gustafson, "HMM based speech synthesis system for Swedish Language," i The Fourth Swedish Language Technology Conference, 2012.

[112]

S. Al Moubayed, G. Skantze och J. Beskow, "Lip-reading : Furhat audio visual intelligibility of a back projected animated face," i Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, s. 196-203.

[113]

S. Al Moubayed et al., "Multimodal Multiparty Social Interaction with the Furhat Head," i 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, 2012, s. 293-294.

[114]

S. Al Moubayed et al., "A robotic head using projected animated faces," i Proceedings of the International Conference on Audio-Visual Speech Processing 2011, 2011, s. 71.

[115]

S. Al Moubayed et al., "Animated Faces for Robotic Heads : Gaze and Beyond," i Analysis of Verbal and Nonverbal Communication and Enactment : The Processing Issues, 2011, s. 19-35.

[116]

J. Beskow et al., "Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation," i Proceedings of International Conference on Audio-Visual Speech Processing 2011, 2011, s. 103-106.

[117]

J. Edlund, S. Al Moubayed och J. Beskow, "The Mona Lisa Gaze Effect as an Objective Metric for Perceived Cospatiality," i Proc. of the Intelligent Virtual Agents 10th International Conference (IVA 2011), 2011, s. 439-440.

[118]

S. Al Moubayed et al., "Audio-Visual Prosody : Perception, Detection, and Synthesis of Prominence," i 3rd COST 2102 International Training School on Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces : Theoretical and Practical Issues, 2010, s. 55-71.

[119]

J. Edlund och J. Beskow, "Capturing massively multimodal dialogues : affordable synchronization and visualization," i Proc. of Multimodal Corpora : Advances in Capturing, Coding and Analyzing Multimodality (MMC 2010), 2010, s. 160-161.

[120]

J. Beskow et al., "Face-to-Face Interaction and the KTH Cooking Show," i Development of multimodal interfaces : Active listing and synchrony, 2010, s. 157-168.

[121]

J. Beskow och S. Al Moubayed, "Perception of Gaze Direction in 2D and 3D Facial Projections," i The ACM / SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010, s. 24-24.

[122]

S. Al Moubayed och J. Beskow, "Perception of Nonverbal Gestures of Prominence in Visual Speech Animation," i Proceedings of the ACM/SSPNET 2nd International Symposium on Facial Analysis and Animation, 2010, s. 25.

[123]

S. Al Moubayed och J. Beskow, "Prominence Detection in Swedish Using Syllable Correlates," i Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, s. 1784-1787.

[124]

S. Schötz et al., "Simulating Intonation in Regional Varieties of Swedish," i Speech Prosody 2010, 2010.

[125]

J. Edlund et al., "Spontal : a Swedish spontaneous dialogue corpus of audio, video and motion capture," i Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10), 2010, s. 2992-2995.

[126]

S. Al Moubayed och J. Beskow, "Effects of Visual Prominence Cues on Speech Intelligibility," i Proceedings of Auditory-Visual Speech Processing AVSP'09, 2009.

[127]

F. López-Colino, J. Beskow och J. Colas, "Mobile Synface : Talking head interface for mobile VoIP telephone calls," i Actas del X Congreso Internacional de Interaccion Persona-Ordenador, INTERACCION 2009, 2009.

[128]

J. Beskow, G. Salvi och S. Al Moubayed, "SynFace : Verbal and Non-verbal Face Animation from Audio," i Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09, 2009.

[129]

J. Beskow, G. Salvi och S. Al Moubayed, "SynFace - Verbal and Non-verbal Face Animation from Audio," i Auditory-Visual Speech Processing 2009, AVSP 2009, 2009.

[130]

J. Beskow et al., "The MonAMI Reminder : a spoken dialogue system for face-to-face interaction," i Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, 2009, s. 300-303.

[131]

S. Al Moubayed et al., "Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting," i INTERSPEECH 2009 : 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, s. 1443-1446.

[132]

J. Beskow och L. Cerrato, "Evaluation of the expressivity of a Swedish talking head in the context of human-machine interaction," i Comunicazione parlatae manifestazione delle emozioni : Atti del I Convegno GSCP, Padova 29 novembre - 1 dicembre 2004, 2008.

[133]

J. Beskow et al., "Hearing at Home : Communication support in home environments for hearing impaired persons," i INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 2203-2206.

[134]

J. Beskow et al., "Innovative interfaces in MonAMI : The Reminder," i Perception In Multimodal Dialogue Systems, Proceedings, 2008, s. 272-275.

[135]

J. Beskow et al., "Recognizing and Modelling Regional Varieties of Swedish," i INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 512-515.

[136]

J. Beskow, B. Granström och D. House, "Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents," i VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS, 2007, s. 250-263.

[137]

J. Edlund och J. Beskow, "Pushy versus meek : using avatars to influence turn-taking behaviour," i INTERSPEECH 2007 : 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, 2007, s. 2784-2787.

[138]

E. Agelfors et al., "User evaluation of the SYNFACE talking head telephone," i Computers Helping People With Special Needs, Proceedings, 2006, s. 579-586.

[139]

J. Beskow, B. Granström och D. House, "Visual correlates to prominence in several expressive modes," i INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 2006, s. 1272-1275.

[140]

J. Beskow och M. Nordenberg, "Data-driven synthesis of expressive visual speech using an MPEG-4 talking head," i 9th European Conference on Speech Communication and Technology, 2005, s. 793-796.

[141]

O. Engwall et al., "Design strategies for a virtual language tutor," i INTERSPEECH 2004, ICSLP, 8^th International Conference on Spoken Language Processing, Jeju Island, Korea, October 4-8, 2004, 2004, s. 1693-1696.

[142]

J. Beskow et al., "Expressive animated agents for affective dialogue systems," i AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, s. 240-243.

[143]

J. Beskow et al., "Preliminary cross-cultural evaluation of expressiveness in synthetic faces," i Affective Dialogue Systems, Proceedings, 2004, s. 301-304.

[144]

J. Beskow et al., "SYNFACE - A talking head telephone for the hearing-impaired," i COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS : PROCEEDINGS, 2004, s. 1178-1185.

[145]

K.-E. Spens et al., "SYNFACE, a talking head telephone for the hearing impaired," i IFHOH 7th World Congress for the Hard of Hearing. Helsinki Finland. July 4-9, 2004, 2004.

[146]

J. Beskow et al., "The Swedish PFs-Star Multimodal Corpora," i Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37.

[147]

E. Agelfors et al., "A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results," i European audiology in 1999 : proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999.

[148]

E. Agelfors et al., "Synthetic visual speech driven from auditory speech," i Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999.

Kapitel i böcker

[149]

G. Skantze, J. Gustafson och J. Beskow, "Multimodal Conversational Interaction with Robots," i The Handbook of Multimodal-Multisensor Interfaces, Volume 3 : Language Processing, Software, Commercialization, and Emerging Directions, Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger red., : ACM Press, 2019.

[150]

J. Edlund, S. Al Moubayed och J. Beskow, "Co-present or Not? : Embodiment, Situatedness and the Mona Lisa Gaze Effect," i Eye gaze in intelligent user interfaces : gaze-based analyses, models and applications, Nakano, Yukiko; Conati, Cristina; Bader, Thomas red., London : Springer London, 2013, s. 185-203.

[151]

J. Edlund, D. House och J. Beskow, "Gesture movement profiles in dialogues from a Swedish multimodal database of spontaneous speech," i Prosodic and Visual Resources in Interactional Grammar, Bergmann, Pia; Brenning, Jana; Pfeiffer, Martin C.; Reber, Elisabeth red., : Walter de Gruyter, 2012.

[152]

J. Beskow et al., "Multimodal Interaction Control," i Computers in the Human Interaction Loop, Waibel, Alexander; Stiefelhagen, Rainer red., Berlin/Heidelberg : Springer Berlin/Heidelberg, 2009, s. 143-158.

[153]

J. Beskow, J. Edlund och M. Nordstrand, "A Model for Multimodal Dialogue System Output Applied to an Animated Talking Head," i SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS, Minker, Wolfgang; Bühler, Dirk; Dybkjær, Laila red., Dordrecht : Springer, 2005, s. 93-113.

Icke refereegranskade

Konferensbidrag

[154]

D. House, S. Alexanderson och J. Beskow, "On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?," i Proceedings of Fonetik 2015, 2015, s. 63-68.

[155]

S. Al Moubayed et al., "Talking with Furhat - multi-party interaction with a back-projected robot head," i Proceedings of Fonetik 2012, 2012, s. 109-112.

[156]

S. Al Moubayed och J. Beskow, "A novel Skype interface using SynFace for virtual speech reading support," i Proceedings from Fonetik 2011, June 8 - June 10, 2011 : Speech, Music and Hearing, Quarterly Progress and Status Report, TMH-OPSR, Volume 51, 2011, 2011, s. 33-36.

[157]

J. Edlund, J. Gustafson och J. Beskow, "Cocktail : a demonstration of massively multi-component audio environments for illustration and analysis," i SLTC 2010, The Third Swedish Language Technology Conference (SLTC 2010) : Proceedings of the Conference, 2010.

[158]

J. Beskow och B. Granström, "Goda utsikter för teckenspråksteknologi," i Språkteknologi för ökad tillgänglighet : Rapport från ett nordiskt seminarium, 2010, s. 77-86.

[159]

J. Beskow et al., "Modelling humanlike conversational behaviour," i SLTC 2010 : The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, 2010, s. 9-10.

[160]

J. Beskow et al., "Research focus : Interactional aspects of spoken face-to-face communication," i Proceedings from Fonetik, Lund, June 2-4, 2010 : , 2010, s. 7-10.

[161]

S. Schötz et al., "Simulating Intonation in Regional Varieties of Swedish," i Fonetik 2010, 2010.

[162]

J. Beskow och J. Gustafson, "Experiments with Synthesis of Swedish Dialects," i Proceedings of Fonetik 2009, 2009, s. 28-29.

[163]

J. Beskow et al., "Project presentation: Spontal : multimodal database of spontaneous dialog," i Proceedings of Fonetik 2009 : The XXIIth Swedish Phonetics Conference, 2009, s. 190-193.

[164]

S. Al Moubayed et al., "Studies on Using the SynFace Talking Head for the Hearing Impaired," i Proceedings of Fonetik'09 : The XXIIth Swedish Phonetics Conference, June 10-12, 2009, 2009, s. 140-143.

[165]

J. Beskow et al., "Human Recognition of Swedish Dialects," i Proceedings of Fonetik 2008 : The XXIst Swedish Phonetics Conference, 2008, s. 61-64.

[166]

F. López-Colino, J. Beskow och J. Colás, "Mobile SynFace : Ubiquitous visual interface for mobile VoIP telephone calls," i Proceedings of The second Swedish Language Technology Conference (SLTC), 2008.

[167]

J. Beskow et al., "Speech technology in the European project MonAMI," i Proceedings of FONETIK 2008, 2008, s. 33-36.

[168]

S. Al Moubayed, J. Beskow och G. Salvi, "SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech," i Proceedings of The second Swedish Language Technology Conference (SLTC), 2008, s. 3-6.

[169]

J. Edlund, J. Beskow och M. Heldner, "MushyPeek : an experiment framework for controlled investigation of human-human interaction control behaviour," i Proceedings of Fonetik 2007, 2007, s. 61-64.

[170]

J. Beskow, B. Granström och D. House, "Focal accent and facial movements in expressive speech," i Proceedings from Fonetik 2006, Lund, June, 7-9, 2006, 2006, s. 9-12.

[171]

C. Siciliano et al., "Evaluation of a Multilingual Synthetic Talking Faceas a Communication Aid for the Hearing Impaired," i Proceedings of the 15th International Congress of Phonetic Science (ICPhS'03), 2003, s. 131-134.

[172]

J. Beskow, O. Engwall och B. Granström, "Resynthesis of Facial and Intraoral Articulation fromSimultaneous Measurements," i Proceedings of the 15th International Congress of phonetic Sciences (ICPhS'03), 2003.

[173]

D. W. Massaro et al., "Picture My Voice : Audio to Visual Speech Synthesis using Artificial Neural Networks," i Proceedings of International Conference on Auditory-Visual Speech Processing, 1999, s. 133-138.

[174]

M. M. Cohen,, J. Beskow och D. W. Massaro, "RECENT DEVELOPMENTS IN FACIAL ANIMATION : AN INSIDE VIEW," i Proceedings of International Conference on Auditory-Visual Speech Processing, 1998, s. 201-206.

[175]

J. Beskow, "ANIMATION OF TALKING AGENTS," i Proceedings of International Conference on Auditory-Visual Speech Processing, 1997, s. 149-152.

[176]

J. Beskow, "RULE-BASED VISUAL SPEECH SYNTHESIS," i Proceedings of the 4th European Conference on Speech Communication and Technology, 1995, s. 299-302.

Kapitel i böcker

[177]

D. W. Massaro et al., "Animated speech : Research progress and applications," i Audiovisual Speech Processing, : Cambridge University Press, 2012, s. 309-345.

Avhandlingar

[178]

J. Beskow, "Talking Heads - Models and Applications for Multimodal Speech Synthesis," Doktorsavhandling : Institutionen för talöverföring och musikakustik, Trita-TMH, 2003:7, 2003.

Övriga

[179]

G. E. Henter, S. Alexanderson och J. Beskow, "Moglow : Probabilistic and controllable motion synthesis using normalising flows," (Manuskript).

[180]

T. Kucherenko, J. Beskow och H. Kjellström, "A neural network approach to missing marker reconstruction in human motion capture," , 2018.

[181]

P. Jonell et al., "Machine Learning and Social Robotics for Detecting Early Signs of Dementia," , 2017.

[182]

K. Stefanov et al., "Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World Dialogues," (Manuskript).

[183]

K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition," (Manuskript).

Senaste synkning med DiVA:

2026-01-21 23:15:45 UTC

Utbildning

Forskning

Samverkan

Om KTH

Bibliotek

Publikationer av Jonas Beskow

Refereegranskade

Artiklar

Konferensbidrag

Kapitel i böcker

Icke refereegranskade

Konferensbidrag

Kapitel i böcker

Avhandlingar

Övriga

Kontakt