The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing a thesis in a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, train the system and evaluate its performance.
The following theoretical components are included:
- algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs)
- methods to decrease the sensitivity against disturbances and deviations
- probability theory
- signal processing and parameter extraction
- acoustic modelling of the static and dynamic spectral properties of the speech sounds statistical modelling of language in spontaneous and formal speech
- search strategies- basic methods and strategies for large vocabularies
- specific analysis and decision making methods for recognition of speakers.
Furthermore, certain practical insight to build an application is given. Here, implementing certain functions based on prototypes and testing them on real speech data are included.