Visa version

Version skapad av Gabriel Skantze 2021-04-12 10:33

Visa < föregående | nästa >
Jämför < föregående | nästa >

Internal Projects

Master Thesis in Speech and Music Communication:

Topic: Automatic grading of voice recording quality

When voices are recorded for clinical purposes, the recordings have to be good enough; but the clinician has neither the expertise nor the time to judge the quality. This project is about developing an analyzer of voice/speech recordings that will automatically identify signal-related problems that a technical support person should fix, such as hum, electrical noise, clipping, ventilation noise, background noise, wrong microphone distance, puff or wind noises, failing microphone cable, etc. The research question could be: how, and to what extent, can we use data-driven machine learning to guide the collection of trustworthy data for improved reliability in subsequent analysis? The resulting analyzer is ultimately intended as a component of a larger system for clinical voice analysis that is being prototyped in parallel.

Required skills:
- Knowledge in analog and digital audio
- Good programming skills
- Knowledge in Machine Learning is a plus (equivalent to KTH machine learning course)

Recommended reading:
- Perceptual Evaluation for Audio Quality (https://www.opticom.de/)
- POLQA (http://www.polqa.info/)
- Matthias Mauch and Sebastian Ewert, The Audio Degradation Toolbox and its Application to Robustness Evaluation, Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013). (https://code.soundsoftware.ac.uk/projects/audio-degradation-toolbox)

Contact:
Contact Sten Ternström (stern@kth.se) or Bob Sturm (bobs@kth.se)

-------------------------------------------------------

Master Theses in Human-Robot Interaction:

Topic:
With contingent interpersonal interactions, we create a neural sense of grounding when the quality, intensity, and timing of others’ signals clearly reflect the signals that we have sent. In HRI, we operationalise contingency as a correlation between robot behaviour and changes in its environment.

Given a set of social actions, it is important for a robot to know what is appropriate to do while in dialogue with humans. In this master thesis project, you will investigate quantitative and qualitative indicators, to assess human reactions in human-robot dialogue. You will design the interaction and a task-oriented dialogue and explore objective and subjective measures from human users. Further, you will experiment with sensor data and build a machine learning classifier to interpret what features from human users contribute to understanding of robot actions.

You will experiment with open-source platforms such as OpenFace and OpenSmile and one of our robotic platforms (Furhat or Nao) to build an application that combines multimodal signals and generates appropriate robot responses.

Required skills:
- Knowledge in human-computer interaction
- Good programming skills in Python
- Knowledge in Machine Learning is a plus (equivalent to KTH machine learning course)

Contact:

Contact Dimos Kontogiorgos (diko@kth.se) or Joakim Gustafson (jocke@speech.kth.se).

-------------------------------------------------------

Master Thesis in Speech and Music Communication:

Topic: Model-based synthesis of singing

Although current trends in voice and speech synthesis focus on data-driven methods based on ML, there are still many outstanding research questions in voice acoustics that are best investigated using an acoustic or physical model of the voice. This thesis project is about reviving a real-time source-filter singing synthesizer from the 1990's in a modern software environment, improving it according to new research findings, and conducting listening tests on the results. There are good prospects for collaboration with a student of electroacoustic composition at Kungl Musikhögskolan.

Required skills:
- Knowledge of digital audio including filtering, and music technology
- Very good programming skills. The project will use SuperCollider, C++ and possibly also PureData and/or LISP.
- Keen interest in learning more about music and/or human voice

Recommended reading:
- Carlsson, G. (1988): "The KTH program for synthesis of singing", Master's thesis, KTH-TMH, Royal Institute of Technology, Stockholm. Available for loan at the department.

- Friberg, A. (1991): Generative Rules for Music Performance: A Formal Description of a Rule System

Contact:

Sten Ternström (stern@kth.se) or Anders Friberg (afriberg@kth.se)

-------------------------------------------------------

Learning muscle activation-acoustic map using a (deep) neural network

Recently biomechanical models of human speech production apparatus has been developed (www.artisynth.org). The purpose of this model is to study speech production and to understand relation between muscle activation patterns, articulation and acoustics. To achieve this purpose, lots of simulations needs to be done by using this model. Another alternative is to choose some limited number of patterns from muscle activation space, run the simulations and save the articulation and acoustic output. Then a neural network (NN) is utilized to learn the relation between two spaces and capability of NN is used to predict the articulatory or acoustic output for any muscle activation patterns. In this thesis, the biomechanical model will be used to generate training and test data sets which are used for training and evaluation the NN. Results will be analyzed in order to explore how speech production is planned and what are the limitations of this method. The results of this study could be published in a conference or journal.

Requirements of applicant: Knowledge in neural networks and speech technology, MATLAB, and Java programing
Suitable as: Master Project

Supervisor: Olov Engwall

Contact: Saeed Dabbaghchian

From vocal tract resonance frequencies to vocal tract area function

Human's speech production apparatus is a very complex system which has been studied by researchers of different fields and still lots of questions is unanswered. One aspect of speech is acoustics which study wave propagation in human's vocal tract. Vocal tract tube or area of cross-sections (area function) is analyzed to calculate resonance frequencies. In some applications, we need to solve the inverse problem by estimating the area function for desired resonance frequencies. Based on Fant's perturbation theory, a desired formants can be achieved using an iterative method. An alternative to this method one could generate samples of area functions and calculate the corresponding formants. A machine learning method (e.g. neural network) is utilized to learn the relationship between area function and formants. Generalization capability of the algorithm may be used to predict the area function for any unseen area function. The results of this study could be published in a conference or journal.

Requirements of applicant: Knowledge in speech technology and machine learning, MATLAB, and Java programing
Suitable as: Master Project

Supervisor: Olov Engwall

Contact: Saeed Dabbaghchian

Master Thesis Proposals in Robotics: Factories of the future (FACT)

The project "Factories of the Future: Human Robot Cooperative Systems" or FACT for short is a 5 year endeavour where the departments Robotics, Perception and Learning and Speech, Music and Hearing are collaborating to develop methods to allow humans and robots to share the same workspace and perform object manipulation tasks jointly. One of the main enabling technologies necessary to realise this is the design of a framework that enables the robot to cooperate smoothly with the human, working towards the same goal, which may not be explicitly communicated to the robot before the task is initiated. For human-robot collaboration to become as efficient as human-human collaboration, a robot must be able to perform both the active and passive parts of the interaction, just as a human would. To build a system which these capabilities requires research beyond the state-of-the-art in the areas of object handling and manipulation; programming by demonstration; natural and embodied interaction; control; perception; etc

For more information about the project, look at the FACT webpage

For thesis project suggestions, please look at this page

We believe that the MSc project has to be tailored for every students and therefore do not list specific thesis project. We want to define them together with you, based on what you know, what you and we are interested in and what fits in the context of the project. We have a handful of phd students are involved in the project and we envision thesis projects connected to the topics of these students. A first contact to find out more and as a way to be directed to the doctoral students involved, contact Patric Jensfelt (patric@kth.se) or Joakim Gustafson (jocke@speech.kth.se).