Projects

Permanent Activity

MSc projects at KTH in collaboration with industry (since 2007)

Many MSc projects at KTH are conducted in collaboration with industry or other universities and research institutes. I have the pleasure of examining or supervising a number of such projects every year. Such external collaborations are always stimulating, and often lead to publications and further collaboration.

Recent publications with external MSc students

Marc Botet Colomer*, Pier Luigi Dovesi*, Theodoros Panagiotakopoulos, J. Frederico Carvalho, Linus Härenstam-Nielsen, Hossein Azizpour, Hedvig Kjellström, Daniel Cremers, and Matteo Poggi. 💀 To Adapt or Not to Adapt? Real-time adaptation for semantic segmentation. In IEEE International Conference on Computer Vision, 2023. (*Joint first authors)
- Videos and code
Jonathan Wenger, Hedvig Kjellström, and Rudolph Triebel. Non-parametric calibration for classification. In International Conference on Artificial Intelligence and Statistics, 2020.

Current Projects

Evaluation of generative models (WASP 2024-present)

This project is part of the WARA Media and Language and a collaboration with the company Electronic Arts.

Generative models will revolutionize many industries and professions, with applications like programming assistants already in use. This raises a need for reliable and automated metrics that measure, for example, method robustness and appropriateness. Understanding quality is particularly crucial in domains less intuitive to the average user than images and text, which might require expert evaluation of each generated sample. Currently, only a few automated metrics exist, and their correlation with human judgment is debatable.
This project aims to design and evaluate reliable and aligned automated metrics for generative models trained on content in domains relevant to the gaming industry. The development of computer and mobile games needs the creation content such as animations, sound effects, and dialogue. The growing demand for a continuous stream of new content, coupled with the availability of user-generated content, has raised the interest in machine learning-driven solutions for automatic content generation. Such domain-specific metrics are highly needed for model development in the gaming industry to enable more rigorous testing and comparison between models.

Collaborators

Yifan Lu (PhD student)
Judith Bütepage (Electronic Arts, Sweden)

Publications

ANITA: ANImal TrAnslator (VR 2024-present)

Beagle

In recent years there has been an explosive growth in neural network based algorithms for interpretation and generation of natural language. One task that has been addressed successfully using neural approaches is machine translation from one language to another.
The goal of the project proposed here is to make an automated interpreter of animal communicative behavior to human language, in order to allow humans to get an insight into the mind of animals in their care. In this project we focus on the dog species, and will potentially explore common denominators with other species such as horse and cattle.
Despite the high potential impact, automated animal behavior recognition is still an undeveloped field. The reason is not the signal itself; the dog species has developed for thousands of years together with humans, and have rich communication and interaction both with humans and with other animal individuals. The main difference from the human language field is instead the lack of data; large data volume is a key success factor in training large neural language models. Thus, data collection is an important venture in this project. Using this data, we propose to develop a deep generative approach to animal behavior recognition from video.

Collaborators

Theo Wieland (PhD student)
Elin Hernlund (Swedish University of Agricultural Sciences, Sweden)
Linda Keeling (Swedish University of Agricultural Sciences, Sweden)

Publications

OrchestrAI: Deep generative models of the communication between conductor and orchestra (WASP, SeRC 2023-present)

In this project, which is part of the WARA Media and Language and a collaboration with the Max Planck Institute for Empirical Aesthetics, we build computer models of the processes by which humans communicate in music performance, and use these to 1) learn about the underlying processes, 2) build different kinds of interactive applications. We focus on the communication between a conductor and an orchestra, a process based on the non-verbal communication of cues and instructions via the conductor’s hand, arm and upper body motion, as well as facial expression and gaze pattern.

In the first part of the project, a museum installation funded by SeRC is designed for the omni-theater Wisdome Stockholm at Tekniska Museet, in collaboration with Berwaldhallen/Swedish Radio Symphony Orchestra and IVAR Studios.

Collaborators

Mert Mermerci (PhD student)

Publications

Mert Mermerci and Hedvig Kjellström. Creating an immersive virtual orchestra conducting experience. In CVPR Workshop on Computer Vision for Fashion, Art, and Design, 2024.

The relation between motion and cognition in infants (SeRC 2023-present)

In this project, which is part of the SeRC Data Science MCP and a collaboration with the Department of Women’s and Children’s health at Karolinska Institutet, we study the relation between motion patterns and cognition and brain function in infants . The currently primary application is detection of motor conditions in neonates, but we will also study more general connections between motion and future development of cognition and language.

Collaborators

Gustaf Mårtensson (Mycronic and Karolinska Institutet, Sweden)
Henglin Shi (Linköping University, Sweden)

Publications

Generative AI for the creation of artificial spiderweb (WASP, DDLS 2023-present)

This project is a collaboration with a group in veterinary biochemistry at the Swedish University of Agricultural Sciences, whose goal is to create artificial spider web - a protein based, highly durable and strong material. Our contribution to the project is to develop deep generative methods for predicting material properties given the protein composition in the raw material.

Collaborators

Neeru Dubey (Post doc)
Anna Rising (Swedish University of Agricultural Sciences, Sweden)

Publications

Neeru Dubey, Elin Karlsson, Miguel Angel Redondo, Johan Reimegård, Anna Rising, and Hedvig Kjellström. Customizing spider silk: Generative models with mechanical property conditioning for protein engineering. Transactions on Machine Learning Research 7, 2025.

UNCOCO: UNCOnscious COmmunication (WASP 2023-present)

This project, which is part of the WARA Media and Language and a collaboration with the Perceptual Neuroscience group at KI, entails two contributions.

Firstly, we develop a 3D embodied, integrated representation of head pose, gaze and facial micro expression, that can be extracted from a regular 60 Hz video camera and a desk-mounted gaze sensor. The embodied, integrated 3D representation of head pose, gaze and facial micro expression provides a preprocessing step to the second contribution, a deep generative model for inferring the latent emotional state of the human from the non-verbal communicative behavior. The model is employed in three different contexts: 1) estimating user affect for a digital avatar, 2) analyzing human non-verbal behavior connected to sensor stimuli, e.g., quantify approach/avoidance motor response to smell, 3) estimating frustration in a driving scenario.

Collaborators

Chen Ling (PhD student)
Henglin Shi (Linköping University, Sweden)

Publications

STING: Synthesis and analysis with Transducers and Invertible Neural Generators (WASP 2022-present)

Human communication is multimodal in nature, and occurs through combinations of speech,
language, gesture, facial expression, and similar signals. To enable natural interactions with human beings, artificial agents must be capable of both analysing and producing these rich and
interdependent signals, and connect them to their semantic implications. Unfortunately, even the strongest machine learning methods currently fall short of this goal: automated semantic understanding of human behaviour remains superficial, and generated agent behaviours are empty gestures lacking the ability to convey meaning and communicative intent.

The STING NEST, part of the WARA Media and Language, intends to change this state of affairs by uniting synthesis and analysis with transducers and invertible neural models. This involves connecting concrete, continuous valued sensory data such as images, sound, and motion, with high level, predominantly discrete, representations of meaning, which has the potential to endow synthesis output with human understandable highlevel explanations, while simultaneously improving the ability to attach probabilities to semantic representations. The bidirectionality also allows us to create efficient mechanisms for explainability, and to inspect and enforce fairness in the models.
Recent advances in generative models suggest that our ambitious research agenda is likely to be met with success. Normalising flows and variational autoencoders permit both extracting disentangled representations of observations, and (re-)generating observations from these abstract representations, all within a single model. Their recent extensions to graph structured data are of particular interest because graphs are commonly used semantic representations.
This opens the door not only to generating structured information, but also to capturing the composition of the generation itself (which is a graph in its own right) by exploiting and transferring techniques from finite state transducers and graph grammars.

Collaborators

Siyuan Yang (Post doc)
Silvia Arellano García (PhD student)
Gustav Eje Henter (KTH)
Johanna Björklund (Umeå University, Sweden)

Publications

Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellström, and Mårten Björkman. Controllable motion synthesis and reconstruction with autoregressive diffusion models. In IEEE International Conference on Robot and Human Interactive Communication, 2023.

Project home page

MARTHA: MARkerless 3D capTure for Horse motion Analysis (KTH, FORMAS 2020-present)

Lameness in horses is a sign of disease or injury, and associated with experiences of pain for the animal. Changed behavior, body pose and motion pattern accompany lameness and many other diseases. Caretakers have difficulties in recognizing early signs of disease, since these signs are subtle and since efficient observation of animals takes resources and time. At the same time, it is crucial to detect these early signs, as the injury that cause them is easy to treat at this early stage, but will potentially be lethal if untreated.

A system for automatic detection of lameness would therefore have a huge positive impact on horse welfare.
This project aims at developing such a system. We propose a Computer Vision and Machine Learning method, MARTHA, that reconstructs the 3D horse shape, pose and motion from video, and recognizes deviations lameness and pain-related behavior from this model.

The developed method will require data. Firstly, we will record 3D motion capture data and time-correlated video in controlled indoor settings. This data will be used to train and evaluate the developed 3D shape, pose and motion model. Secondly, we will record, as well as leverage previous recordings of, horses with induced lameness or with clinically obtained ground truth lameness diagnoses. This data will be used to train methods for recognition and localization of orthopaedic injuries, and for detection of pain-related behavior.

Collaborators

Pia Haubro Andersen (Swedish University of Agricultural Sciences, Sweden)
Michael Black (Max Planck Institute for Intelligent Systems, Germany)
Elin Hernlund (Swedish University of Agricultural Sciences, Sweden)
Silvia Zuffi (IMATI-CNR, Italy)

Publications

Ci Li, Yi Yang, Zehang Weng, Elin Hernlund, Silvia Zuffi, and Hedvig Kjellström. Dessie: Disentanglement for articulated 3D horse shape and pose estimation from images. In Asian Conference on Computer Vision, 2024.
- Videos and code
Ci Li, Elin Hernlund, Hedvig Kjellström, and Silvia Zuffi. CLHOP: Combined audio-video learning for horse 3D pose and shape estimation. In CVPR Workshop on Computer Vision for Animal Behavior Tracking and Modeling, 2024.
Ci Li, Ylva Mellbin, Johanna Krogager, Senya Polikovsky, Martin Holmberg, Nima Ghorbani, Michael J. Black, Hedvig Kjellström, Silvia Zuffi, and Elin Hernlund. The poses for equine research dataset (PFERD). Nature Scientific Data 11, 497, 2024.
- Data, videos and code
Silvia Zuffi, Ylva Mellbin, Ci Li, Markus Hoeschle, Hedvig Kjellström, Senya Polikovsky, Elin Hernlund, and Michael J. Black. VAREN: Very accurate and realistic equine network. In IEEE Conference on Computer Vision and Pattern Recognition, 2024.
- Videos and code
Felix Järemo Lawin, Anna Byström, Christoffer Roepstorff, Marie Rhodin, Mattias Almlöf, Mudith Silva, Pia Haubro Andersen, Hedvig Kjellström, and Elin Hernlund. Is Markerless More or Less? Comparing a smartphone computer vision method for equine lameness assessment to multi-camera motion capture. Animals 13(3), 2023.
Ci Li, Elin Hernlund, Hedvig Kjellström, and Silvia Zuffi. The Sound of Motion: Multimodal horse motion estimation from video and audio. In CVPR Workshop on Sight and Sound, 2022.
Hedvig Kjellström. AI and Machine Learning methods in equine research. Invited short paper in International Conference on Equine Exercise Physiology, 2022.
Ci Li, Nima Ghorbani, Sofia Broomé, Maheen Rashid, Michael J. Black, Elin Hernlund, Hedvig Kjellström, and Silvia Zuffi. hSMAL: Detailed horse shape and pose reconstruction for motion pattern recognition. In CVPR Workshop on Computer Vision for Animal Behavior Tracking and Modeling, 2021.

Past Projects

HiSS: Humanizing the Sustainable Smart city (Digital Futures 2019-2024)

The overarching objective is to improve our understanding of how human social behavior shapes sustainable smart city design and development through multi-level interactions between humans and cyber agents. Our key hypothesis is that human-social wellbeing is the main driver of smart city development, strongly influencing human-social choices and behaviors. This hypothesis will be substantiated through mathematical and computational modelling, which spans and links multiple scales, and tested by means of several case studies set in the Stockholm region as part of the Digital Demo Stockholm initiative.

Collaborators

Pawel Herman (KTH)
Karl Henrik Johansson (KTH)

Publications

Carles Balsells-Rodas, Xavier Sumba, Tanmayee Narendra, Ruibo Tu, Gabriele Schweikert, Hedvig Kjellström, and Yingzhen Li. Causal discovery from conditionally stationary time series, in International Conference on Machine Learning, 2025.
Andrew Karvonen, Vladimir Cvetkovic, Pawel Herman, Karl Henrik Johansson, Hedvig Kjellström, Marco Molinari, and Mikael Skoglund. The 'New Urban Science': Towards the interdisciplinary and transdisciplinary pursuit of sustainable transformations. Urban Transformations 3:9, 2021.
Chenda Zhang and Hedvig Kjellström. A subjective model of human decision making based on Quantum Decision Theory, arXiv:2101.05851, 2021.
Moein Sorkhei, Gustav Eje Henter, and Hedvig Kjellström. Full-Glow: Fully conditional Glow for more realistic image generation. In DAGM German Conference on Pattern Recognition, 2021.

Project homepage

Variational Approximations and Inference for survival analysis and joint modeling (SeRC 2019-2022)

This project, which is part of the SeRC eCPC MCP, is a collaboration with the Biostatistics group at the Department of Medical Epidemiology and Biostatistics. Together we develop Variational methods for more accurate prediction of survival probability.

Collaborators

Mark Clements (Karolinska Institutet, Sweden)
Keith Humphreys (Karolinska Institutet, Sweden)

Publications

Benjamin Christoffersen, Behrang Mahjani, Mark Clements, Hedvig Kjellström, and Keith Humphreys. Quasi-Monte Carlo methods for binary event models with complex family data. Journal of Computational and Graphical Statistics 1-9, 2023.
Benjamin Christoffersen, Mark Clements, Keith Humphreys, and Hedvig Kjellström. Asymptotically exact and fast Gaussian copula models for imputation of mixed data types. In Asian Conference on Machine Learning, 2021.

AIVIA: AI for Viable Cities (VINNOVA 2019-2020)

This is a subproject of the VINNOVA project Viable Cities, where we wrote a series of popular scientific essays on AI and cities of the future.
Magnus Boman. Att framtidsskriva en stad, Medium, September 11, 2019.
Magnus Boman. Strategier för utveckling av framtidens artificiella intelligens, Medium, October 3, 2019.
Hedvig Kjellström. Från Big Data till Small Data - data är inte gratis, Medium, October 17, 2019.
Magnus Boman. Ghost work powering AI-based services, Medium, January 13, 2020.
Hedvig Kjellström. Singulariteten... Är det verkligen den mest angelägna frågan?, Medium, February 5, 2020.

Collaborators

Magnus Boman (Karolinska Institutet, Sweden)

EquineML: Machine Learning methods for recognition of the pain expressions of horses (VR, FORMAS 2017-2022)

Recognition of pain in horses and other animals is important, because pain is a manifestation of disease and decreases animal welfare. Pain diagnostics for humans typically includes self-evaluation and location of the pain with the help of standardized forms, and labeling of the pain by an clinical expert using pain scales. However, animals cannot verbalize their pain as humans can, and the use of standardized pain scales is challenged by the fact that animals as horses and cattle, being prey animals, display subtle and less obvious pain behavior - it is simply beneficial for a prey animal to appear healthy, in order lower the interest from predators. The aim of this project is to develop methods for automatic recognition of pain in horses, with the help of Computer Vision.

Collaborators

Pia Haubro Andersen (Swedish University of Agricultural Sciences, Sweden)
Elin Hernlund (Swedish University of Agricultural Sciences, Sweden)

Publications

Ernest Pokropek, Sofia Broomé, Pia Haubro Andersen, and Hedvig Kjellström. Predictive modeling of equine activity budgets using a 3D skeleton reconstructed from surveillance recordings. In CVPR Workshop on Computer Vision for Animal Behavior Tracking and Modeling, 2023.
Sofia Broomé, Marcelo Feighelstein, Anna Zamansky, Gabriel Carreira Lencioni, Pia Haubro Andersen, Francisca Pessanha, Marwa Mahmoud, Hedvig Kjellström, and Albert Ali Salah. Going deeper than tracking: A survey of computer-vision based recognition of animal pain and affective state. International Journal of Computer Vision 131, 2023.
Sofia Broomé, Ernest Pokropek, Boyu Li, and Hedvig Kjellström. Recur, Attend or Convolve? Frame dependency modeling matters for cross-domain robustness in action recognition. In IEEE Winter Conference on Applications of Computer Vision, 2023.
- Dataset
Sofia Broomé, Katrina Ask, Maheen Rashid, Pia Haubro Andersen, and Hedvig Kjellström. Sharing pain: Using pain domain transfer for video recognition of low grade orthopedic pain in horses. PLOS ONE 17(3):e0263854, 2022.
Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, and Yong Jae Lee. Equine pain behavior classification via self-supervised disentangled pose representation. In IEEE Winter Conference on Applications of Computer Vision, 2022.
Pia Haubro Andersen, Sofia Broomé, Maheen Rashid, Johan Lundblad, Katrina Ask, Zhenghong Li, Elin Hernlund, Marie Rhodin, and Hedvig Kjellström. Towards machine recognition of facial expressions of pain in horses. Animals 11(6), 2021.
Zhenghong Li, Sofia Broomé, Pia Haubro Andersen, and Hedvig Kjellström. Automated detection of equine facial action units, arXiv:2102.08983, 2021.
Joonatan Mänttäri*, Sofia Broomé*, John Folkesson, and Hedvig Kjellström. Interpreting video features: A comparison of 3D convolutional networks and convolutional LSTM networks. In Asian Conference on Computer Vision, 2020. (*Joint first authors)
- Videos and figures
Maheen Rashid, Hedvig Kjellström, and Yong Jae Lee. Action graphs: Weakly-supervised action localization with graph convolution networks. In IEEE Winter Conference on Applications of Computer Vision, 2020.
Sofia Broomé, Karina Bech Gleerup, Pia Haubro Andersen, and Hedvig Kjellström. Dynamics are important for the recognition of equine pain in video. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Pia Haubro Andersen, Karina Bech Gleerup, Jennifer Wathan, Britt Coles, Hedvig Kjellström, Sofia Broomé, Yong Jae Lee, Maheen Rashid, Claudia Sonder, Erika Rosenberg, and Deborah Forster. Can a machine learn to see horse pain? An interdisciplinary approach towards automated decoding of facial expressions of pain in the horse. In International Conference on Methods and Techniques in Behavioral Research, 2018.

Project home page

Automatic visual understanding for visually impaired persons (Promobilia 2017-2022)

In this project, we aim to build an automatic image understanding system that can perform real-time robust image understanding to aid visually impaired persons.

Collaborators

Cheng Zhang (Ellison Institute of Technology Oxford, UK)

Publications

Marcus Klasson, Hedvig Kjellström, and Cheng Zhang. Learn the time to learn: Replay scheduling in continual learning. Transactions on Machine Learning Research 09, 2023.
Marcus Klasson, Cheng Zhang, and Hedvig Kjellström. Using variational multi-view learning for classification of grocery items. Patterns 1(8):100143, 2020.
- Supplementary material
- Dataset
Marcus Klasson, Cheng Zhang, and Hedvig Kjellström. A hierarchical grocery store image dataset with visual and semantic labels. In IEEE Winter Conference on Applications of Computer Vision, 2019.
- Dataset

Causal Healthcare (SeRC, KTH 2016-2022)

	In this project, which is part of the SeRC Data Science MCP, we develop Machine Learning methods that discover causal structures from medical data, for automatic decision support to medical doctors in their work to diagnose different types of injuries and illnesses. The purpose is to determine the underlying causes of observed symptoms and measurements, making it possible to semi-automatically reason about the potential effects of different actions, and propose suitable treatment. Collaborators Cheng Zhang (Ellison Institute of Technology Oxford, UK) Publications Ruibo Tu, Kun Zhang, Hedvig Kjellström, and Cheng Zhang. Optimal transport for causal discovery. In International Conference on Learning Representations, 2022. Xueru Zhang, Ruibo Tu, Yang Liu, Mingyan Liu, Hedvig Kjellström, Kun Zhang, and Cheng Zhang. How do fair decisions fare in long-term qualification? In Neural Information Processing Systems, 2020. (Joint first authors) Video* Code Ruibo Tu, Kun Zhang, Bo Christer Bertilson, Hedvig Kjellström, and Cheng Zhang. Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation. In Neural Information Processing Systems, 2019. Video Code Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(8), 2019 Charles Hamesse, Ruibo Tu, Paul Ackermann, Hedvig Kjellström, and Cheng Zhang. Simultaneous measurement imputation and outcome prediction for Achilles tendon rupture rehabilitation. In Machine Learning for Healthcare, 2019. Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Clark Glymour, Hedvig Kjellström, and Kun Zhang. Causal discovery in the presence of missing data. In International Conference on Artificial Intelligence and Statistics, 2019. Code Cheng Zhang, Hedvig Kjellström, and Stephan Mandt. Determinantal point processes for mini-batch diversification. In Conference on Uncertainty in Artificial Intelligence, 2017. Cheng Zhang, Hedvig Kjellström, and Carl Henrik Ek. Inter-battery topic representation learning. In European Conference on Computer Vision, 2016. Cheng Zhang, Hedvig Kjellström, Carl Henrik Ek, and Bo Christer Bertilson. Diagnostic prediction using discomfort drawings with IBTM. In Machine Learning for Healthcare, 2016.

In this project, which is part of the SeRC Data Science MCP, we develop Machine Learning methods that discover causal structures from medical data, for automatic decision support to medical doctors in their work to diagnose different types of injuries and illnesses. The purpose is to determine the underlying causes of observed symptoms and measurements, making it possible to semi-automatically reason about the potential effects of different actions, and propose suitable treatment.

Collaborators

Cheng Zhang (Ellison Institute of Technology Oxford, UK)

Publications

Ruibo Tu, Kun Zhang, Hedvig Kjellström, and Cheng Zhang. Optimal transport for causal discovery. In International Conference on Learning Representations, 2022.
Xueru Zhang*, Ruibo Tu*, Yang Liu, Mingyan Liu, Hedvig Kjellström, Kun Zhang, and Cheng Zhang. How do fair decisions fare in long-term qualification? In Neural Information Processing Systems, 2020. (*Joint first authors)
- Video
- Code
Ruibo Tu, Kun Zhang, Bo Christer Bertilson, Hedvig Kjellström, and Cheng Zhang. Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation. In Neural Information Processing Systems, 2019.
- Video
- Code
Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(8), 2019
Charles Hamesse, Ruibo Tu, Paul Ackermann, Hedvig Kjellström, and Cheng Zhang. Simultaneous measurement imputation and outcome prediction for Achilles tendon rupture rehabilitation. In Machine Learning for Healthcare, 2019.
Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Clark Glymour, Hedvig Kjellström, and Kun Zhang. Causal discovery in the presence of missing data. In International Conference on Artificial Intelligence and Statistics, 2019.
- Code
Cheng Zhang, Hedvig Kjellström, and Stephan Mandt. Determinantal point processes for mini-batch diversification. In Conference on Uncertainty in Artificial Intelligence, 2017.
Cheng Zhang, Hedvig Kjellström, and Carl Henrik Ek. Inter-battery topic representation learning. In European Conference on Computer Vision, 2016.
Cheng Zhang, Hedvig Kjellström, Carl Henrik Ek, and Bo Christer Bertilson. Diagnostic prediction using discomfort drawings with IBTM. In Machine Learning for Healthcare, 2016.

Aerial Robotic Choir - expressive body language in different embodiments (KTH, 2016-2021)

During ancient times, the choir (χορος, khoros) had a major function in the classical Greek theatrical plays - commenting on and interacting with the main characters in the drama. We aim to create a robotic choir, invited to take part in a full-scale operatic performance in Rijeka, Croatia, September 2020 - thereby grounding our technological research in an ancient theatrical and operatic tradition. In our re-interpretation, the choir will consist of a swarm of small flying drones that have perceptual capabilities and thereby will be able to interact with human singers, reacting to their behavior both as individual agents, and as a swarm.

Collaborators

Carl Unander-Scharin (Ingesund School of Music, Sweden, composer and opera singer)
Åsa Unander-Scharin (Luleå University of Technology, Sweden, choreographer and dancer)

Publications

Sara Eriksson, Åsa Unander-Scharin, Vincent Trichon, Carl Unander-Scharin, Hedvig Kjellström, and Kristina Höök. Dancing with drones: Crafting novel artistic expressions through intercorporeality. In ACM CHI Conference on Human Factors in Computing Systems, 2019.

EACare: Embodied Agent to support elderly mental wellbeing (SSF, 2016-2021)

The main goal of the multidisciplinary project EACare is to develop an embodied agent – a robot head with communicative skills – capable of interacting with especially elderly people at a clinic or in their home, analyzing their mental and psychological status via powerful audiovisual sensing and assessing their mental abilities to identify subjects in high risk or possibly at the first stages of cognitive decline, with a special focus on Alzheimer’s disease. The interaction is performed according to the procedures developed for memory evaluation sessions, the key part of the diagnostic process for detecting cognitive decline.
This new diagnostic system will be one of the means by which medical doctors evaluate people for cognitive decline, in parallel to the existing methods such as memory evaluation sessions with a (human) clinician, MRI scans, blood tests, etc. Different parts of the framework can also be used for other purposes, such as to develop tools for dementia preventive training and for decision support during clinical memory evaluation sessions.

Collaborators

Jonas Beskow (KTH)
Joakim Gustafson (KTH)
Gustav Eje Henter (KTH)
Miia Kivipelto (Karolinska Institutet, Sweden)

Publications

Olga Mikheeva, Ieva Kazlauskaite, Adam Hartshorne, Hedvig Kjellström, Carl-Henrik Ek, and Neill DF Campbell. Aligned multi-task Gaussian process. In International Conference on Artificial Intelligence and Statistics, 2022.
Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. Multimodal analysis of the predictability of hand-gesture properties. In International Conference on Autonomous Agents and Multi-Agent Systems, 2022.
- Videos and code
Taras Kucherenko, Rajmund Nagy, Patrik Jonell, Michael Neff, Hedvig Kjellström, and Gustav Eje Henter. Speech2Properties2Gestures: Gesture-property prediction as a tool for generating representational gestures from speech. In ACM International Conference on Intelligent Virtual Agents, 2021.
- Videos and code
- IVA 2021 Honorable Mention extended abstracts
Rajmund Nagy*, Taras Kucherenko*, Birger Moëll, André Pereira, Hedvig Kjellström, and Ulysses Bernardet. A framework for integrating gesture generation models into interactive conversational agents. In International Conference on Autonomous Agents and Multiagent Systems, demo track, 2021. (*Joint first authors)
- Videos and code
Patrik Jonell*, Birger Moëll*, Krister Håkansson*, Gustav Eje Henter, Taras Kucherenko, Olga Mikheeva, Göran Hagman, Jasper Holleman, Miia Kivipelto, Hedvig Kjellström, Joakim Gustafson and Jonas Beskow. Multimodal capture of patient behaviour for improved detection of early dementia: Clinical feasibility and preliminary results. Frontiers in Computer Science 3, 2021. (*Joint first authors)
Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, and Hedvig Kjellström. Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation. International Journal of Human-Computer Interaction 37(14), 2021.
Krister Håkansson, Jonas Beskow, Hedvig Kjellström, Joakim Gustafsson, Alexandre Bonnard, Marie Rydén, Sara Stormoen, Göran Hagman, Ulrika Akenine, Kristal Morales Peres, Gustav Henter, Maria Sundström, and Miia Kivipelto. Robot-assisted detection of subclinical dementia: progress report and preliminary findings. Alzheimer's & Dementia 16(S6):e043311, 2020.
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, and Hedvig Kjellström. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In ACM International Conference on Multimodal Interaction, 2020.
- Videos and code
- ICMI 2020 Best Paper award
Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. Analyzing input and output representations for speech-driven gesture generation. In ACM Intelligent Virtual Agents Conference, 2019.
- Videos and code
Olga Mikheeva, Carl Henrik Ek, and Hedvig Kjellström. Perceptual facial expression representation. In IEEE International Conference on Automatic Face and Gesture Recognition, 2018.

Project home page

Data-driven modelling of interaction skills for social robots (ICT-TNG 2016-2018)

This project aims to investigate fundamentals of situated and collaborative multi-party interaction and collect the data and knowledge required to build social robots that are able to handle collaborative attention and co-present interaction. In the project we will employ state-of-the art motion- and gaze tracking on a large scale as the basis for modelling and implementing critical non-verbal behaviours such as joint attention, mutual gaze and backchannels in situated human-robot collaborative interaction, in a fluent, adaptive and context sensitive way.

Collaborators

Jonas Beskow (KTH)

Publications

Kalin Stefanov, Giampiero Salvi, Dimosthenis Kontogiorgos, Hedvig Kjellström, and Jonas Beskow. Modeling of human visual attention in multiparty open-world dialogues. ACM Transactions on Human-Robot Interaction 8(2):1-21, 2019.
Yanxia Zhang, Jonas Beskow, and Hedvig Kjellström. Look but don't stare: Mutual gaze interaction in social robots. In International Conference on Social Robotics, 2017.

SocSMCs: Socialising SensoriMotor Contingencies (EU H2020 2015-2018)

As robots become more omnipresent in our society, we are facing the challenge of making them more socially competent. However, in order to safely and meaningfully cooperate with humans, robots must be able to interact in ways that humans find intuitive and understandable. Addressing this challenge, we propose a novel approach for understanding and modelling social behaviour and implementing social coupling in robots. Our approach presents a radical departure from the classical view of social cognition as mind- reading, mentalising or maintaining internal rep-resentations of other agents. This project is based on the view that even complex modes of social interaction are grounded in basic sensorimotor interaction patterns. SensoriMotor Contingencies (SMCs) are known to be highly relevant in cognition. Our key hypothesis is that learning and mastery of action-effect contingencies are also critical to enable effective coupling of agents in social contexts. We use "socSMCs" as a shorthand for such socially relevant action-effect contingencies. We will investigate socSMCs in human-human and human-robot social interaction scenarios. The main objectives of the project are to elaborate and investigate the concept of socSMCs in terms of information-theoretic and neurocomputational models, to deploy them in the control of humanoid robots for social entrainment with humans, to elucidate the mechanisms for sustaining and exercising socSMCs in the human brain, to study their breakdown in patients with autism spectrum disorders, and to benchmark the socSMCs approach in several demonstrator scenarios. Our long term vision is to realize a new socially competent robot technology grounded in novel insights into mechanisms of functional and dysfunctional social behavior, and to test novel aspects and strategies for human-robot interaction and cooperation that can be applied in a multitude of assistive roles relying on highly compact computational solutions.

My part of the project comprised human motion and activity forecasting.

Collaborators

Danica Kragic (KTH)
Michael Black (Max Planck Institute for Intelligent Systems, Germany)
Judith Bütepage (Electronic Arts, Sweden)

Publications

Judith Bütepage, Hedvig Kjellström, and Danica Kragic. Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration. In IEEE International Conference on Robotics and Automation, 2018.
Judith Bütepage, Michael J. Black, Danica Kragic, and Hedvig Kjellström. Deep representation learning for human motion prediction and classification. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Analyzing the motion of musical conductors (KTH, 2014-2017)

Classical music sound production is structured by an underlying manuscript, the sheet music, that specifies into some detail what will happen in the music. However, the sheet music specifies only up to a certain degree how the music sounds when performed by an orchestra; there is room for considerable variation in terms of timbre, texture, balance between instrument groups, tempo, local accents, and dynamics. In larger ensembles, such as symphony orchestras, the interpretation of the sheet music is done by the conductor. We propose to learn a simplified generative model of the entire music production process from data; the conductor's articulated body motion in combination with the produced orchestra sound. This model can be exploited for two applications; the first is "home conducting" systems, i.e., conductor-sensitive music synthesizers, the second is tools for analyzing conductor-orchestra communication, where latent states in the conducting process are inferred from recordings of conducting motion and orchestral sound.

Collaborators

Anders Friberg (KTH)

Publications

Kelly Karipidou*, Josefin Ahnlund*, Anders Friberg, Simon Alexanderson, and Hedvig Kjellström. Computer analysis of sentiment interpretation in musical conducting. In IEEE Conference on Automatic Face and Gesture Recognition, 2017. (*Joint first authors)
- Dataset and videos

FOVIAL: FOrensic VIdeo AnaLysis - finding out what really happened (VR, EU Marie Curie 2013-2016)

In parallel to the massive increase of text data available on the Internet, there has been a corresponding increase in the amount of available surveillance video. There are good and bad aspects of this. One undeniably positive aspect is that it is possible to gather evidence from surveillance video when investigating crimes or the causes of accidents; forensic video analysis. Forensic video investigations are now carried out manually. This involves a huge and very tedious effort; e.g., 60 000 hours of video in the Breivik investigation. The amount of surveillance data is also constantly growing. This means that in future investigations, it will no longer be possible to go through all the evidence manually. The solution is to automate parts of the process. In this project we propose to learn an event model from surveillance data, that can be used to characterize all events in a new set of surveillance recorded from a camera network. Our model will also represent the causal dependencies and correlations between events. Using this model, or explanation, of the data from the network, a semi-automated forensic video analysis tool with a human in the loop will be designed, where the user chooses a certain event, e.g., a certain individual getting off a train, and the system returns all earlier observations of this individual, or all other instances of people getting off trains, or all the events that may have caused or are correlated with the given "person getting off train" event.
There are two activities in this project, development of methods for image retrieval, and development of core topic model methodologies for video classification.

Collaborators

Carl Henrik Ek (University of Cambridge, UK)
Cheng Zhang (Ellison Institute of Technology Oxford, UK)

Publications

David Geronimo and Hedvig Kjellström. Unsupervised surveillance video retrieval based on human action and appearance. In IAPR International Conference on Pattern Recognition, 2014.
Cheng Zhang, Carl Henrik Ek, Andreas Damianou, and Hedvig Kjellström. Factorized topic models. In International Conference on Learning Representations, 2013.

RoboHow (EU FP7 2013-2016)

Robohow aims at enabling robots to competently perform everyday human-scale manipulation activities - both in human working and living environments. In order to achieve this goal, Robohow pursues a knowledge-enabled and plan-based approach to robot programming and control. The vision of the project is that of a cognitive robot that autonomously performs complex everyday manipulation tasks and extends its repertoire of such by acquiring new skills using web-enabled and experience-based learning as well as by observing humans.

My part of the project comprised object tracking and visuo-haptic object exploration.

Collaborators

Danica Kragic (KTH)

Publications

Sergio Caccamo, Püren Güler, Hedvig Kjellström, and Danica Kragic. Active perception and modeling of deformable surfaces using Gaussian processes and position-based dynamics. In IEEE-RAS International Conference on Humanoid Robots, 2016.
Alessandro Pieropan, Niklas Bergström, Masatoshi Ishikawa, Danica Kragic, and Hedvig Kjellström. Robust tracking of unknown objects through adaptive size estimation and appearance learning. In IEEE International Conference on Robotics and Automation, 2016.
Alessandro Pieropan, Niklas Bergström, Masatoshi Ishikawa, and Hedvig Kjellström. Robust and adaptive keypoint-based object tracking. Advanced Robotics 30(4), 2016.
Püren Güler, Karl Pauwels, Alessandro Pieropan, Hedvig Kjellström, and Danica Kragic. Estimating the deformability of elastic materials using optical flow and position-based dynamics. In IEEE-RAS International Conference on Humanoid Robots, 2015.

TOMSY: TOpology based Motion SYnthesis for dexterous manipulation (EU FP7 2011-2014)

The aim of TOMSY is to enable a generational leap in the techniques and scalability of motion synthesis algorithms. We propose to do this by learning and exploiting appropriate topological representations and testing them on challenging domains of flexible, multi-object manipulation and close contact robot control and computer animation. Traditional motion planning algorithms have struggled to cope with both the dimensionality of the state and action space and generalisability of solutions in such domains. This proposal builds on existing geometric notions of topological metrics and uses data driven methods to discover multi-scale mappings that capture key invariances - blending between symbolic, discrete and continuous latent space representations. We will develop methods for sensing, planning and control using such representations.

Collaborators

Danica Kragic (KTH)
Carl Henrik Ek (University of Cambridge, UK)
Cheng Zhang (Ellison Institute of Technology Oxford, UK)

Publications

Alessandro Pieropan, Carl Henrik Ek, and Hedvig Kjellström. Recognizing object affordances in terms of spatio-temporal object-object relationships. In IEEE-RAS International Conference on Humanoid Robots, 2014.
Alessandro Pieropan, Giampiero Salvi, Karl Pauwels, and Hedvig Kjellström. Audio-visual classification and detection of human manipulation actions. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014.
- Dataset
- Video
Alessandro Pieropan and Hedvig Kjellström. Unsupervised object exploration using context. In IEEE International Symposium on Robot and Human Interactive Communication, 2014.
Javier Romero, Thomas Feix, Carl Henrik Ek, Hedvig Kjellström, and Danica Kragic. Extracting postural synergies for robotic grasping. IEEE Transactions on Robotics, 29(6), 2013.
- Video
Cheng Zhang, Dan Song, and Hedvig Kjellström. Contextual modeling with labeled multi-LDA. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
Javier Romero, Hedvig Kjellström, Carl Henrik Ek, and Danica Kragic. Non-parametric hand pose estimation with object context. Image and Vision Computing, 31(8), 2013.
Alessandro Pieropan, Carl Henrik Ek, and Hedvig Kjellström. Functional object descriptors for human activity modeling. In IEEE International Conference on Robotics and Automation, 2013.
- Video
Martin Hjelm, Carl Henrik Ek, Renaud Detry, Hedvig Kjellström, and Danica Kragic. Sparse summarization of robotic grasp data. In IEEE International Conference on Robotics and Automation, 2013.
Akshaya Thippur, Carl Henrik Ek, and Hedvig Kjellström. Inferring hand pose: A comparative study of visual shape features. In IEEE International Conference on Automatic Face and Gesture Recognition, 2013.
Florian T. Pokorny, Carl Henrik Ek, Hedvig Kjellström, and Danica Kragic. Persistent homology for learning densities with bounded support. In Advances in Neural Information Processing Systems 25, 2012.

Gesture-based violin synthesis (KTH, 2011-2012)

There are many commercial applications of synthesized music from acoustic instruments, e.g. generation of orchestral sound from sheet music. Whereas the sound generation process of some types of instruments, like piano, is fairly well known, the sound of a violin has been proven extremely difficult to synthesize. The reason is that the underlying process is highly complex: The art of violin-playing involves extremely fast and precise motion with timing in the order of milliseconds.
We believe that ideas from Machine Learning can be employed to build better violin sound synthesizers. The task of this project is to use learning methods to create a generative model of violin sound from sheet music, using an intermediate representation of the kinematic system (violin and bow) generating the sound. To train the generative model, a database with motion capture of bowing will be used, containing a large set of bowing examples, performed by 6 professional violinists.

Collaborators

Anders Askenfelt (KTH)

Publications

Akshaya Thippur, Anders Askenfelt, and Hedvig Kjellström. Probabilistic modeling of bowing gestures for gesture-based violin sound synthesis. In Stockholm Music Acoustics Conference, 2013.

HumanAct: Visual and multi-modal learning of Human Activity and interaction with the surrounding scene (VR, EIT ICT Labs 2010-2013)

The overwhelming majority of human activities are interactive in the sense that they relate to the world around the human (in Computer Vision called the "scene"). Despite this, visual analyses of human activity very rarely take scene context into account. The objective in this project is modeling of human activity with object and scene context.

The methods developed within the project will be applied to the task of Learning from Demonstration, where a (household) robot learns how to perform a task (e.g. preparing a dish) by watching a human perform the same task.

Collaborators

Michael Black (Max Planck Institute for Intelligent Systems, Germany)
Carl Henrik Ek (University of Cambridge, UK)
Cheng Zhang (Ellison Institute of Technology Oxford, UK)

Publications

Cheng Zhang, Carl Henrik Ek, Andreas Damianou, and Hedvig Kjellström. Factorized topic models. In International Conference on Learning Representations, 2013.
Cheng Zhang, Dan Song, and Hedvig Kjellström. Contextual modeling with labeled multi-LDA. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.
Alessandro Pieropan, Carl Henrik Ek, and Hedvig Kjellström. Functional object descriptors for human activity modeling. In IEEE International Conference on Robotics and Automation, 2013.
- Video
Hedvig Kjellström, Javier Romero, and Danica Kragic. Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding 115, 2011.
Hedvig Kjellström, Danica Kragic, and Michael J. Black. Tracking people interacting with objects. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.

PACO-PLUS: Perception, Action and COgnition through learning of object-action complexes (EU FP6 2007-2010)

The EU project PACO-PLUS brings together an interdisciplinary research team to design and build cognitive robots capable of developing perceptual, behavioural and cognitive categories that can be used, communicated and shared with other humans and artificial agents. In my part of the project, we are interested in programming by demonstration applications, in which a robot learns how to perform a task by watching a human do the same task. This involves learning about the scene, objects in the scene, and actions performed on those objects. It also involves learning grammatical structures of actions and objects involved in a task.

Collaborators

Jan-Olof Eklundh (KTH)
Danica Kragic (KTH)

Publications

Javier Romero, Hedvig Kjellström, Carl Henrik Ek, and Danica Kragic. Non-parametric hand pose estimation with object context. Image and Vision Computing 31(8), 2013.
Hedvig Kjellström, Javier Romero, and Danica Kragic. Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding 115, 2011.

Hedvig Kjellström,
Professor
hedvig@kth.se
+46 8 790 69 06

Studies

Research

Collaboration

About KTH

Library

Projects

Permanent Activity

Current Projects

Past Projects

Portfolio

Contact