Hoppa till huvudinnehållet

Sök på KTH:s webbplats
Sök på Studentwebben Sök på KTH Intranät
English

Publikationer av Gustav Henter

Refereegranskade

Artiklar

[1]

T. Kucherenko et al., "Evaluating Gesture Generation in a Large-scale Open Challenge : The GENEA Challenge 2022," ACM Transactions on Graphics, vol. 43, no. 3, 2024.

[2]

P. Wolfert, G. E. Henter och T. Belpaeme, "Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour," Applied Sciences, vol. 14, no. 4, 2024.

[3]

S. Nyatsanga et al., "A Comprehensive Review of Data-Driven Co-Speech Gesture Generation," Computer graphics forum (Print), vol. 42, no. 2, s. 569-596, 2023.

[4]

S. Alexanderson et al., "Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models," ACM Transactions on Graphics, vol. 42, no. 4, 2023.

[5]

J. G. De Gooijer, G. E. Henter och A. Yuan, "Kernel-based hidden Markov conditional densities," Computational Statistics & Data Analysis, vol. 169, 2022.

[6]

T. Kucherenko et al., "Moving Fast and Slow : Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation," International Journal of Human-Computer Interaction, vol. 37, no. 14, s. 1300-1316, 2021.

[7]

P. Jonell et al., "Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia : Clinical Feasibility and Preliminary Results," Frontiers in Computer Science, vol. 3, 2021.

[8]

G. Valle-Perez et al., "Transflower : probabilistic autoregressive dance generation with multimodal attention," ACM Transactions on Graphics, vol. 40, no. 6, 2021.

[9]

G. E. Henter, S. Alexanderson och J. Beskow, "MoGlow : Probabilistic and controllable motion synthesis using normalising flows," ACM Transactions on Graphics, vol. 39, no. 6, s. 1-14, 2020.

[10]

S. Alexanderson et al., "Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows," Computer graphics forum (Print), vol. 39, no. 2, s. 487-496, 2020.

[11]

G. E. Henter och W. B. Kleijn, "Minimum entropy rate simplification of stochastic processes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 12, s. 2487-2500, 2016.

[12]

P. N. Petkov, G. E. Henter och W. B. Kleijn, "Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, s. 1035-1045, 2013.

[13]

G. E. Henter och W. B. Kleijn, "Picking up the pieces : Causal states in noisy data, and how to recover them," Pattern Recognition Letters, vol. 34, no. 5, s. 587-594, 2013.

Konferensbidrag

[14]

S. H. Bokkahalli Satish, G. E. Henter och É. Székely, "When Voice Matters : Evidence of Gender Disparity in Positional Bias of SpeechLLMs," i Speech and Computer - 27th International Conference, SPECOM 2025, Proceedings, 2026, s. 25-38.

[15]

S. H. Bokkahalli Satish, G. E. Henter och É. Székely, "Hear Me Out : Interactive evaluation and bias discovery platform for speech-to-speech conversational AI," i Interspeech 2025, 2025, s. 2151-2152.

[16]

V. S. Lodagala et al., "SawtArabi : A Benchmark Corpus for Arabic TTS. Standard, Dialectal and Code-Switching," i Interspeech 2025, 2025, s. 4793-4797.

[17]

P. Tuttösí et al., "Take a Look, it's in a Book, a Reading Robot," i HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, 2025, s. 1803-1805.

[18]

U. Wennberg och G. E. Henter, "Exploring Internal Numeracy in Language Models: A Case Study on ALBERT," i MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings, 2024, s. 35-40.

[19]

S. Mehta et al., "Fake it to make it : Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis," i Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, s. 1952-1964.

[20]

S. Mehta et al., "Fake it to make it : Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesis," i Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024, 2024, s. 1952-1964.

[21]

Y. Yoon et al., "GENEA Workshop 2024 : The 5th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2024, 2024, s. 694-695.

[22]

U. Wennberg och G. E. Henter, "Learned Transformer Position Embeddings Have a Low-Dimensional Structure," i ACL 2024 - 9th Workshop on Representation Learning for NLP, RepL4NLP 2024 - Proceedings of the Workshop, 2024, s. 237-244.

[23]

S. Mehta et al., "MATCHA-TTS: A FAST TTS ARCHITECTURE WITH CONDITIONAL FLOW MATCHING," i 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings, 2024, s. 11341-11345.

[24]

S. Mehta et al., "Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech," i Interspeech 2024, 2024, s. 2285-2289.

[25]

S. Mehta et al., "Unified speech and gesture synthesis using flow matching," i 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, s. 8220-8224.

[26]

P. Wolfert, G. E. Henter och T. Belpaeme, ""Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion," i ICMI 2023 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 2023, s. 6-10.

[27]

S. Wang et al., "A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS," i ICASSPW 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, 2023.

[28]

P. Pérez Zarazaga, G. E. Henter och Z. Malisz, "A processing framework to access large quantities of whispered speech found in ASMR," i ICASSP 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

[29]

J. J. Webber et al., "Autovocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing," i ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings, 2023.

[30]

S. Mehta et al., "Diff-TTSG : Denoising probabilistic integrated speech and gesture synthesis," i Proceedings 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, 2023, s. 150-156.

[31]

Y. Yoon et al., "GENEA Workshop 2023 : The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ICMI 2023 : Proceedings of the 25th International Conference on Multimodal Interaction, 2023, s. 822-823.

[32]

S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," i Interspeech 2023, 2023, s. 4279-4283.

[33]

H. Lameris et al., "Prosody-Controllable Spontaneous TTS with Neural HMMs," i International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

[34]

P. Pérez Zarazaga et al., "Speaker-independent neural formant synthesis," i Interspeech 2023, 2023, s. 5556-5560.

[35]

T. Kucherenko et al., "The GENEA Challenge 2023 : A large-scale evaluation of gesture generation models in monadic and dyadic setings," i Proceedings Of The 25Th International Conference On Multimodal Interaction, Icmi 2023, 2023, s. 792-801.

[36]

P. Wolfert et al., "GENEA Workshop 2022 : The 3rd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ACM International Conference Proceeding Series, 2022, s. 799-800.

[37]

T. Kucherenko et al., "Multimodal analysis of the predictability of hand-gesture properties," i AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022, s. 770-779.

[38]

S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," i 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, s. 7457-7461.

[39]

C. Valentini-Botinhao et al., "Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks," i INTERSPEECH 2022, 2022, s. 471-475.

[40]

J. Fong et al., "Speech Audio Corrector : using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech," i INTERSPEECH 2022, 2022, s. 1213-1217.

[41]

Y. Yoon et al., "The GENEA Challenge 2022 : A large evaluation of data-driven co-speech gesture generation," i ICMI 2022 : Proceedings of the 2022 International Conference on Multimodal Interaction, 2022, s. 736-747.

[42]

G. Beck et al., "Wavebender GAN : An architecture for phonetically meaningful speech manipulation," i 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022.

[43]

T. Kucherenko et al., "A large, crowdsourced evaluation of gesture generation systems on common data : The GENEA Challenge 2020," i Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces, 2021, s. 11-21.

[44]

M. M. Sorkhei, G. E. Henter och H. Kjellström, "Full-Glow : Fully conditional Glow for more realistic image generation," i Pattern Recognition : 43rd DAGM German Conference, DAGM GCPR 2021, 2021, s. 697-711.

[45]

T. Kucherenko et al., "GENEA Workshop 2021 : The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i Proceedings of ICMI '21: International Conference on Multimodal Interaction, 2021, s. 872-873.

[46]

P. Jonell et al., "HEMVIP: Human Evaluation of Multiple Videos in Parallel," i ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 707-711.

[47]

S. Wang et al., "Integrated Speech and Gesture Synthesis," i ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 177-185.

[48]

T. Kucherenko et al., "Speech2Properties2Gestures : Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech," i IVA '21 : Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, 2021, s. 145-147.

[49]

U. Wennberg och G. E. Henter, "The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models," i ACL-IJCNLP 2021 : THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, s. 130-140.

[50]

É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," i 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, s. 7649-7653.

[51]

S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," i Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.

[52]

T. Kucherenko et al., "Gesticulator : A framework for semantically-aware speech-driven gesture generation," i ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction, 2020.

[53]

P. Jonell et al., "Let’s face it : Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings," i IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020.

[54]

K. Håkansson et al., "Robot-assisted detection of subclinical dementia : progress report and preliminary findings," i In 2020 Alzheimer's Association International Conference. ALZ., 2020.

[55]

A. Ghosh et al., "Robust classification using hidden markov models and mixtures of normalizing flows," i 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020.

[56]

S. Alexanderson och G. E. Henter, "Robust model training and generalisation with Studentising flows," i Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2020, s. 25:1-25:9.

[57]

T. Kucherenko et al., "Analyzing Input and Output Representations for Speech-Driven Gesture Generation," i 19th ACM International Conference on Intelligent Virtual Agents, 2019.

[58]

É. Székely, G. E. Henter och J. Gustafson, "Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector," i 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, s. 6925-6929.

[59]

É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," i The 10th ISCA Speech Synthesis Workshop, 2019.

[60]

É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 3687-3688.

[61]

T. Kucherenko et al., "On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract," i International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada, 2019, s. 2072-2074.

[62]

P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," i Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.

[63]

É. Székely et al., "Spontaneous conversational speech synthesis from found data," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 4435-4439.

[64]

Z. Malisz et al., "The speech synthesis phoneticians need is both realistic and controllable," i Proceedings from FONETIK 2019, 2019.

[65]

O. Watts et al., "Where do the improvements come from in sequence-to-sequence neural TTS?," i Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019, s. 217-222.

[66]

P. N. Petkov, W. B. Kleijn och G. E. Henter, "Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 166-169.

[67]

G. E. Henter, M. R. Frean och W. B. Kleijn, "Gaussian process dynamical models for nonparametric speech representation and synthesis," i Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, s. 4505-4508.

[68]

G. E. Henter och W. B. Kleijn, "Intermediate-State HMMs to Capture Continuously-Changing Signal Features," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2011, s. 1828-1831.

[69]

G. E. Henter och W. B. Kleijn, "Simplified Probability Models for Generative Tasks : a Rate-Distortion Approach," i Proceedings of the European Signal Processing Conference, 2010, s. 1159-1163.

Icke refereegranskade

Konferensbidrag

[70]

H. Lameris et al., "Spontaneous Neural HMM TTS with Prosodic Feature Modification," i Proceedings of Fonetik 2022, 2022.

Avhandlingar

[71]

G. E. Henter, "Probabilistic Sequence Models with Speech and Language Applications," Doktorsavhandling Stockholm : KTH Royal Institute of Technology, Trita-EE, 2013:042, 2013.

Övriga

[72]

S. Wang et al., "A comparative study of self-supervised speech representationsin read and spontaneous TTS," (Manuskript).

[73]

G. E. Henter, S. Alexanderson och J. Beskow, "Moglow : Probabilistic and controllable motion synthesis using normalising flows," (Manuskript).

[74]

G. E. Henter, A. Leijon och W. B. Kleijn, "Kernel Density Estimation-Based Markov Models with Hidden State," (Manuskript).

[75]

G. E. Henter och W. B. Kleijn, "Minimum Entropy Rate Simplification of Stochastic Processes," (Manuskript).

[76]

T. Kucherenko et al., "The GENEA Challenge 2020 : Benchmarking gesture-generation systems on common data," (Manuskript).

Senaste synkning med DiVA:

2026-03-01 00:41:53 UTC