Skip to main content
Till KTH:s startsida Till KTH:s startsida

Area 1: Human Aesthetic Expression


In this area we explore the aesthetic aspects of human communicative behavior. For example, what are the mechanisms of communication between a musical conductor and an orchestra, how do the musicians interpret the conductor motion? Why do humans get stimulation from watching a dancer? And, how can a completely different embodiment, e.g., a swarm of drones, express feelings and attitudes while performing on stage? We investigate these questions in collaboration with a range of performing arts professionals; musicians, conductors, and dancers.


Research Engineers

  • Mert Mermerci (2021-2022)
  • Tobias Jaeger (2020-2021)
  • Vincent Trichon (2017-2019)
  • Akshaya Thippur (2012)

MSc Students

  • Jacob Stuart (MSc 2022)
  • Josefin Ahnlund (MSc 2016)
  • Kelly Karipidou (MSc 2015)

PhD Students


  • Emily Cross (Macquarie University, Australia and University of Glasgow, UK, dancer)
  • Fredrik Ullén (Max Planck Institute for Empirical Aesthetics, Germany, concert pianist)
  • Carl Unander-Scharin (Ingesund School of Music, Sweden, composer and opera singer)
  • Åsa Unander-Scharin (Luleå University of Technology, Sweden, choreographer and dancer)

Current Projects

OrchestrAI: Deep generative models of the communication between conductor and orchestra (WASP 2023-present)


In this project, which is part of the WARA Media and Language and a collaboration with the Max Planck Institute for Empirical Aesthetics, we build computer models of the processes by which humans communicate in music performance, and use these to 1) learn about the underlying processes, 2) build different kinds of interactive applications. We focus on the communication between a conductor and an orchestra, a process based on the non-verbal communication of cues and instructions via the conductor’s hand, arm and upper body motion, as well as facial expression and gaze pattern.

In the first part of the project, a museum installation is designed for the omni-theater Wisdome Stockholm at Tekniska Museet, in collaboration with the Swedish Radio Symphony Orchestra.


  • Mert Mermerci and Hedvig Kjellström. Creating an immersive virtual orchestra conducting experience. In CVPR Workshop on Computer Vision for Fashion, Art, and Design, 2024.

Past Projects

Aerial Robotic Choir - expressive body language in different embodiments (KTH, 2016-2021)

During ancient times, the choir (χορος, khoros) had a major function in the classical Greek theatrical plays - commenting on and interacting with the main characters in the drama. We aim to create a robotic choir, invited to take part in a full-scale operatic performance in Rijeka, Croatia, September 2020 - thereby grounding our technological research in an ancient theatrical and operatic tradition. In our re-interpretation, the choir will consist of a swarm of small flying drones that have perceptual capabilities and thereby will be able to interact with human singers, reacting to their behavior both as individual agents, and as a swarm.


Analyzing the motion of musical conductors (KTH, 2014-2017)

Classical music sound production is structured by an underlying manuscript, the sheet music, that specifies into some detail what will happen in the music. However, the sheet music specifies only up to a certain degree how the music sounds when performed by an orchestra; there is room for considerable variation in terms of timbre, texture, balance between instrument groups, tempo, local accents, and dynamics. In larger ensembles, such as symphony orchestras, the interpretation of the sheet music is done by the conductor. We propose to learn a simplified generative model of the entire music production process from data; the conductor's articulated body motion in combination with the produced orchestra sound. This model can be exploited for two applications; the first is "home conducting" systems, i.e., conductor-sensitive music synthesizers, the second is tools for analyzing conductor-orchestra communication, where latent states in the conducting process are inferred from recordings of conducting motion and orchestral sound.


Gesture-based violin synthesis (KTH, 2011-2012)

There are many commercial applications of synthesized music from acoustic instruments, e.g. generation of orchestral sound from sheet music. Whereas the sound generation process of some types of instruments, like piano, is fairly well known, the sound of a violin has been proven extremely difficult to synthesize. The reason is that the underlying process is highly complex: The art of violin-playing involves extremely fast and precise motion with timing in the order of milliseconds.
We believe that ideas from Machine Learning can be employed to build better violin sound synthesizers. The task of this project is to use learning methods to create a generative model of violin sound from sheet music, using an intermediate representation of the kinematic system (violin and bow) generating the sound. To train the generative model, a database with motion capture of bowing will be used, containing a large set of bowing examples, performed by 6 professional violinists.