Hoppa till huvudinnehållet
Till KTH:s startsida Till KTH:s startsida

Publikationer av Gustav Henter

Refereegranskade

Artiklar

[1]
P. Wolfert, G. E. Henter och T. Belpaeme, "Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal Behaviour," Applied Sciences, vol. 14, no. 4, 2024.
[2]
S. Nyatsanga et al., "A Comprehensive Review of Data-Driven Co-Speech Gesture Generation," Computer graphics forum (Print), vol. 42, no. 2, s. 569-596, 2023.
[3]
S. Alexanderson et al., "Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models," ACM Transactions on Graphics, vol. 42, no. 4, 2023.
[4]
J. G. De Gooijer, G. E. Henter och A. Yuan, "Kernel-based hidden Markov conditional densities," Computational Statistics & Data Analysis, vol. 169, 2022.
[5]
T. Kucherenko et al., "Moving Fast and Slow : Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation," International Journal of Human-Computer Interaction, vol. 37, no. 14, s. 1300-1316, 2021.
[7]
G. Valle-Perez et al., "Transflower : probabilistic autoregressive dance generation with multimodal attention," ACM Transactions on Graphics, vol. 40, no. 6, 2021.
[8]
G. E. Henter, S. Alexanderson och J. Beskow, "MoGlow : Probabilistic and controllable motion synthesis using normalising flows," ACM Transactions on Graphics, vol. 39, no. 6, s. 1-14, 2020.
[9]
S. Alexanderson et al., "Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows," Computer graphics forum (Print), vol. 39, no. 2, s. 487-496, 2020.
[10]
G. E. Henter och W. B. Kleijn, "Minimum entropy rate simplification of stochastic processes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 12, s. 2487-2500, 2016.
[11]
P. N. Petkov, G. E. Henter och W. B. Kleijn, "Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, s. 1035-1045, 2013.
[12]
G. E. Henter och W. B. Kleijn, "Picking up the pieces : Causal states in noisy data, and how to recover them," Pattern Recognition Letters, vol. 34, no. 5, s. 587-594, 2013.

Konferensbidrag

[13]
P. Wolfert, G. E. Henter och T. Belpaeme, ""Am I listening?", Evaluating the Quality of Generated Data-driven Listening Motion," i ICMI 2023 Companion : Companion Publication of the 25th International Conference on Multimodal Interaction, 2023, s. 6-10.
[14]
S. Wang et al., "A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS," i ICASSPW 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, 2023.
[15]
P. Pérez Zarazaga, G. E. Henter och Z. Malisz, "A processing framework to access large quantities of whispered speech found in ASMR," i ICASSP 2023 : 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[16]
Y. Yoon et al., "GENEA Workshop 2023 : The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ICMI 2023 : Proceedings of the 25th International Conference on Multimodal Interaction, 2023, s. 822-823.
[17]
S. Mehta et al., "OverFlow : Putting flows on top of neural transducers for better TTS," i Interspeech 2023, 2023, s. 4279-4283.
[18]
H. Lameris et al., "Prosody-Controllable Spontaneous TTS with Neural HMMs," i International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.
[19]
P. Pérez Zarazaga et al., "Speaker-independent neural formant synthesis," i Interspeech 2023, 2023, s. 5556-5560.
[20]
T. Kucherenko et al., "The GENEA Challenge 2023 : A large-scale evaluation of gesture generation models in monadic and dyadic setings," i Proceedings Of The 25Th International Conference On Multimodal Interaction, Icmi 2023, 2023, s. 792-801.
[21]
P. Wolfert et al., "GENEA Workshop 2022 : The 3rd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i ACM International Conference Proceeding Series, 2022, s. 799-800.
[22]
T. Kucherenko et al., "Multimodal analysis of the predictability of hand-gesture properties," i AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022, s. 770-779.
[23]
S. Mehta et al., "Neural HMMs are all you need (for high-quality attention-free TTS)," i 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, s. 7457-7461.
[26]
Y. Yoon et al., "The GENEA Challenge 2022 : A large evaluation of data-driven co-speech gesture generation," i ICMI 2022 : Proceedings of the 2022 International Conference on Multimodal Interaction, 2022, s. 736-747.
[27]
G. Beck et al., "Wavebender GAN : An architecture for phonetically meaningful speech manipulation," i 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022.
[28]
T. Kucherenko et al., "A large, crowdsourced evaluation of gesture generation systems on common data : The GENEA Challenge 2020," i Proceedings IUI '21: 26th International Conference on Intelligent User Interfaces, 2021, s. 11-21.
[29]
M. M. Sorkhei, G. E. Henter och H. Kjellström, "Full-Glow : Fully conditional Glow for more realistic image generation," i Pattern Recognition : 43rd DAGM German Conference, DAGM GCPR 2021, 2021, s. 697-711.
[30]
T. Kucherenko et al., "GENEA Workshop 2021 : The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents," i Proceedings of ICMI '21: International Conference on Multimodal Interaction, 2021, s. 872-873.
[31]
P. Jonell et al., "HEMVIP: Human Evaluation of Multiple Videos in Parallel," i ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 707-711.
[32]
S. Wang et al., "Integrated Speech and Gesture Synthesis," i ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, 2021, s. 177-185.
[33]
T. Kucherenko et al., "Speech2Properties2Gestures : Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech," i IVA '21 : Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, 2021, s. 145-147.
[34]
U. Wennberg och G. E. Henter, "The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models," i ACL-IJCNLP 2021 : THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, s. 130-140.
[35]
É. Székely et al., "Breathing and Speech Planning in Spontaneous Speech Synthesis," i 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, s. 7649-7653.
[36]
S. Alexanderson et al., "Generating coherent spontaneous speech and gesture from text," i Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020, 2020.
[37]
T. Kucherenko et al., "Gesticulator : A framework for semantically-aware speech-driven gesture generation," i ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction, 2020.
[38]
P. Jonell et al., "Let’s face it : Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings," i IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020.
[39]
K. Håkansson et al., "Robot-assisted detection of subclinical dementia : progress report and preliminary findings," i In 2020 Alzheimer's Association International Conference. ALZ., 2020.
[40]
A. Ghosh et al., "Robust classification using hidden markov models and mixtures of normalizing flows," i 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020.
[41]
S. Alexanderson och G. E. Henter, "Robust model training and generalisation with Studentising flows," i Proceedings of the ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2020, s. 25:1-25:9.
[42]
T. Kucherenko et al., "Analyzing Input and Output Representations for Speech-Driven Gesture Generation," i 19th ACM International Conference on Intelligent Virtual Agents, 2019.
[43]
É. Székely, G. E. Henter och J. Gustafson, "Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector," i 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, s. 6925-6929.
[44]
É. Székely et al., "How to train your fillers: uh and um in spontaneous speech synthesis," i The 10th ISCA Speech Synthesis Workshop, 2019.
[45]
É. Székely et al., "Off the cuff : Exploring extemporaneous speech delivery with TTS," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 3687-3688.
[46]
T. Kucherenko et al., "On the Importance of Representations for Speech-Driven Gesture Generation : Extended Abstract," i International Conference on Autonomous Agents and Multiagent Systems (AAMAS '19), May 13-17, 2019, Montréal, Canada, 2019, s. 2072-2074.
[47]
P. Wagner et al., "Speech Synthesis Evaluation : State-of-the-Art Assessment and Suggestion for a Novel Research Program," i Proceedings of the 10th Speech Synthesis Workshop (SSW10), 2019.
[48]
É. Székely et al., "Spontaneous conversational speech synthesis from found data," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, s. 4435-4439.
[49]
Z. Malisz et al., "The speech synthesis phoneticians need is both realistic and controllable," i Proceedings from FONETIK 2019, 2019.
[50]
O. Watts et al., "Where do the improvements come from in sequence-to-sequence neural TTS?," i Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019, s. 217-222.
[51]
P. N. Petkov, W. B. Kleijn och G. E. Henter, "Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 166-169.
[52]
G. E. Henter, M. R. Frean och W. B. Kleijn, "Gaussian process dynamical models for nonparametric speech representation and synthesis," i Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, s. 4505-4508.
[53]
G. E. Henter och W. B. Kleijn, "Intermediate-State HMMs to Capture Continuously-Changing Signal Features," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2011, s. 1828-1831.
[54]
G. E. Henter och W. B. Kleijn, "Simplified Probability Models for Generative Tasks : a Rate-Distortion Approach," i Proceedings of the European Signal Processing Conference, 2010, s. 1159-1163.

Icke refereegranskade

Konferensbidrag

[55]
H. Lameris et al., "Spontaneous Neural HMM TTS with Prosodic Feature Modification," i Proceedings of Fonetik 2022, 2022.

Avhandlingar

[56]
G. E. Henter, "Probabilistic Sequence Models with Speech and Language Applications," Doktorsavhandling Stockholm : KTH Royal Institute of Technology, Trita-EE, 2013:042, 2013.
Senaste synkning med DiVA:
2024-05-07 00:14:02