Hoppa till huvudinnehållet

Sök på KTH:s webbplats
Sök på Studentwebben Sök på KTH Intranät
English

Publikationer av Éva Székely

Refereegranskade

Artiklar

[1]

É. Székely et al., "Facial expression-based affective speech translation," Journal on Multimodal User Interfaces, vol. 8, no. 1, s. 87-96, 2014.

[2]

É. Székely et al., "Predicting synthetic voice style from facial expressions. An application for augmented conversations," Speech Communication, vol. 57, s. 63-75, 2014.

Konferensbidrag

[3]

S. H. Bokkahalli Satish, G. E. Henter och É. Székely, "When Voice Matters : Evidence of Gender Disparity in Positional Bias of SpeechLLMs," i Speech and Computer - 27th International Conference, SPECOM 2025, Proceedings, 2026, s. 25-38.

[4]

J. Francis, J. Gustafsson och É. Székely, "From Static to Dynamic : Enhancing AAC with Generative Imagery and Zero-Shot TTS," i Interspeech 2025, 2025, s. 4960-4962.

[5]

S. H. Bokkahalli Satish, G. E. Henter och É. Székely, "Hear Me Out : Interactive evaluation and bias discovery platform for speech-to-speech conversational AI," i Interspeech 2025, 2025, s. 2151-2152.

[6]

R. Jacka et al., "Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking," i CUI 2025 - Proceedings of the 2025 ACM Conference on Conversational User Interfaces, 2025.

[7]

É. Székely et al., "Voice Reconstruction through Large-Scale TTS Models : Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication," i Interspeech 2025, 2025, s. 2735-2739.

[8]

H. Lameris, J. Gustafsson och É. Székely, "VoiceQualityVC : A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech," i Interspeech 2025, 2025, s. 2295-2299.

[9]

M. Hope och É. Székely, "Voices of 'cyborg awesomeness' : Posthuman embodiment of nonbinary gender expression in AI speech technologies," i Interspeech 2025, 2025, s. 689-693.

[10]

D. Puhach, A. H. Payberah och É. Székely, "Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM," i Interspeech 2025, 2025, s. 2058-2062.

[11]

J. O'Mahony, C. Lai och É. Székely, ""Well", what can you do with messy data? Exploring the prosody and pragmatic function of the discourse marker "well" with found data and speech synthesis," i Interspeech 2024, 2024, s. 4084-4088.

[12]

É. Székely och M. Hope, "An inclusive approach to creating a palette of synthetic voices for gender diversity," i Interspeech 2024, 2024, s. 3070-3074.

[13]

É. Székely och M. Hope, "An inclusive approach to creating a palette of synthetic voices for gender diversity," i Proc. Interspeech 2024, 2024, s. 3070-3074.

[14]

J. Francis, É. Székely och J. Gustafsson, "ConnecTone : A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS," i Interspeech 2024, 2024, s. 1001-1002.

[15]

S. Wang, É. Székely och J. Gustafsson, "Contextual Interactive Evaluation of TTS Models in Dialogue Systems," i Interspeech 2024, 2024, s. 2965-2969.

[16]

H. Lameris, J. Gustafsson och É. Székely, "CreakVC : A Voice Conversion Tool for Modulating Creaky Voice," i Interspeech 2024, 2024, s. 1005-1006.

[17]

S. Wang och É. Székely, "Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model," i 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, s. 6464-6474.

[18]

S. Mehta et al., "MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING," i 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings, 2024, s. 11341-11345.

[19]

S. Mehta et al., "Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech," i Interspeech 2024, 2024, s. 2285-2289.

[20]

H. Lameris, É. Székely och J. Gustafsson, "The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS," i 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, s. 16058-16065.

[21]

S. Mehta et al., "Unified speech and gesture synthesis using flow matching," i 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, s. 8220-8224.

[22]

É. Székely, J. Higginbotham och F. Possemato, "Voice and choice: Investigating the role of prosodic variation in request compliance and perceived politeness using conversational TTS," i Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2024, s. 466-476.

[23]

S. Wang et al., "A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS," i ICASSPW 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, 2023.

[24]

E. Ekstedt et al., "Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023, 2023, s. 5481-5485.

[25]

H. Lameris, J. Gustafsson och É. Székely, "Beyond style : synthesizing speech with pragmatic functions," i Interspeech 2023, 2023, s. 3382-3386.

[26]

I. Torre et al., "Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions?," i 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, s. 106-112.

[27]

J. Gustafsson et al., "Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters," i 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition, FG 2023, 2023.

[28]

S. Mehta et al., "Diff-TTSG : Denoising probabilistic integrated speech and gesture synthesis," i Proceedings 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, 2023, s. 150-156.

[29]

J. Gustafsson, É. Székely och J. Beskow, "Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters," i 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023), 2023.

[30]

J. Miniotaitė et al., "Hi robot, it's not what you say, it's how you say it," i 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, s. 307-314.

[31]

H. Lameris et al., "Neural speech synthesis with controllable creaky voice style," i Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023, 2023, s. 3141-3145.

[32]

S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," i Interspeech 2023, 2023, s. 4279-4283.

[33]

A. Kirkland, J. Gustafsson och É. Székely, "Pardon my disfluency : The impact of disfluency effects on the perception of speaker competence and confidence," i Interspeech 2023, 2023, s. 5217-5221.

[34]

H. Lameris et al., "Prosody-Controllable Spontaneous TTS with Neural HMMs," i International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

[35]

É. Székely, J. Gustafsson och I. Torre, "Prosody-controllable gender-ambiguous speech synthesis : a tool for investigating implicit bias in speech perception," i Interspeech 2023, 2023, s. 1234-1238.

[36]

H. Lameris et al., "Situating speech synthesis : Investigating contextual factors in the evaluation of conversational TTS," i Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France, August 26–28, 2023, 2023, s. 69-74.

[37]

É. Székely, S. Wang och J. Gustafsson, "So-to-Speak : an exploratory platform for investigating the interplay between style and prosody in TTS," i Interspeech 2023, 2023, s. 2016-2017.

[38]

M. Elmers, J. O'Mahony och É. Székely, "Synthesis after a couple PINTs : Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception," i Interspeech 2023, 2023, s. 4843-4847.

[39]

M. P. Aylett et al., "Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking," i HAI 2023 - Proceedings of the 11th Conference on Human-Agent Interaction, 2023, s. 490-492.

[40]

S. Wang, J. Gustafsson och É. Székely, "Evaluating Sampling-based Filler Insertion with Spontaneous TTS," i LREC 2022 : Thirteen International Conference On Language Resources And Evaluation, 2022, s. 1960-1969.

[41]

S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," i 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, s. 7457-7461.

[42]

N. Ward et al., "Two Pragmatic Functions of Breathy Voice in American English Conversation," i Proceedings 11th International Conference on Speech Prosody, 2022, s. 82-86.

[43]

A. Kirkland et al., "Where's the uh, hesitation? : The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence," i INTERSPEECH 2022, 2022, s. 4990-4994.

[44]

S. Wang et al., "Integrated Speech and Gesture Synthesis," i ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 177-185.

[45]

A. Kirkland et al., "Perception of smiling voice in spontaneous speech synthesis," i Proceedings of Speech Synthesis Workshop (SSW11), 2021, s. 108-112.

[46]

É. Székely, J. Edlund och J. Gustafsson, "Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis," i Proceedings of The 12th Language Resources and Evaluation Conference, 2020, s. 6368-6374.

[47]

É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," i 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, s. 7649-7653.

[48]

S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," i Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.

[49]

É. Székely, G. E. Henter och J. Gustafson, "Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector," i 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, s. 6925-6929.

[50]

É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," i The 10th ISCA Speech Synthesis Workshop, 2019.

[51]

L. Clark et al., "Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions," i CHI EA '19 EXTENDED ABSTRACTS : EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019.

[52]

É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 3687-3688.

[53]

P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," i Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.

[54]

É. Székely et al., "Spontaneous conversational speech synthesis from found data," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 4435-4439.

[55]

S. Betz et al., "The greennn tree - lengthening position influences uncertainty perception," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019, 2019, s. 3990-3994.

[56]

É. Székely, P. Wagner och J. Gustafson, "THE WRYLIE-BOARD: MAPPING ACOUSTIC SPACE OF EXPRESSIVE FEEDBACK TO ATTITUDE MARKERS," i Proc. IEEE Spoken Language Technology conference, 2018.

[57]

E. Szekely, J. Mendelson och J. Gustafson, "Synthesising uncertainty : The interplay of vocal effort and hesitation disfluencies," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, s. 804-808.

[58]

B. R. Cowan et al., "They Know as Much as We Do : Knowledge Estimation and Partner Modelling of Artificial Partners," i CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition, 2017, s. 1836-1841.

[59]

C. Oertel et al., "Using crowd-sourcing for the design of listening agents : Challenges and opportunities," i ISIAA 2017 - Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, Co-located with ICMI 2017, 2017, s. 37-38.

[60]

É. Székely, M. T. Keane och J. Carson-Berndsen, "The Effect of Soft, Modal and Loud Voice Levels on Entrainment in Noisy Conditions," i Sixteenth Annual Conference of the International Speech Communication Association, 2015.

[61]

Z. Ahmed et al., "A system for facial expression-based affective speech translation," i Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion, 2013, s. 57-58.

[62]

É. Székely et al., "Detecting a targeted voice style in an audiobook using voice quality features," i Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, s. 4593-4596.

[63]

É. Székely et al., "Evaluating expressive speech synthesis from audiobooks in conversational phrases," i International Conference on Language Resources and Evaluation. MAY 21-27, 2012., 2012, s. 3335-3339.

[64]

É. Székely et al., "Facial expression as an input annotation modality for affective speech-to-speech translation," i Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, 2012.

[65]

M. Abou-Zleikha et al., "Multi-level exemplar-based duration generation for expressive speech synthesis," i Proceedings of Speech Prosody, 2012.

[66]

J. P. Cabral et al., "Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz.," i Proceedings of the International Conference on Language Resources and Evaluation, 2012, s. 4136-4142.

[67]

É. Székely et al., "Synthesizing expressive speech from amateur audiobook recordings," i Spoken Language Technology Workshop (SLT), 2012, s. 297-302.

[68]

J. P. Cabral et al., "Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions," i IS ADEPT, 2012.

[69]

É. Székely et al., "WinkTalk : a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices," i Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies, 2012, s. 5-8.

[70]

É. Székely et al., "Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters.," i 12th Annual Conference of the International-Speech-Communication-Association 2011 (INTERSPEECH 2011), 2011, s. 2409-2412.

[71]

P. Cahill et al., "Ucd blizzard challenge 2011 entry," i Proceedings of the Blizzard Challenge Workshop, 2011.

Icke refereegranskade

Konferensbidrag

[72]

H. Lameris et al., "Spontaneous Neural HMM TTS with Prosodic Feature Modification," i Proceedings of Fonetik 2022, 2022.

Övriga

[73]

S. Wang et al., "A comparative study of self-supervised speech representationsin read and spontaneous TTS," (Manuskript).

Senaste synkning med DiVA:

2026-03-05 23:07:22 UTC