The course objective is to provide a systematic introduction to speech processing and recognition. Models of speech production and speech analysis will form a basis to understanding the problem of speech recognition. Probabilistic machine learning methods will be employed for the recognition task, including Hidden Markov Models, Gaussian Mixture Models, Support Vector Machines, Deep Neural Networks.
Information for research students about course offerings
PhD students can take the doctoral course with code 2F5118. A more extensive project work is required compared with DT2119
Choose semester and course offering
Choose semester and course offering to see information from the correct course syllabus and course offering.
Content and learning outcomes
The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing an essay on a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, training the system and evaluating its performance.
The following theoretical course components are included:
- algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs)
- methods to decrease the sensitivity to disturbances and deviations
- probability theory
- signal processing and parameter extraction
- acoustic modelling of the static and dynamic spectral properties of speech sounds
- statistical modelling of language in spontaneous and formal speech
- search strategies - basic methods and strategies for large vocabularies
- specific methods for analysis and decision making, for recognition of speakers.
Furthermore, some practical insights into building an application are given. This includes the implementation of certain functions based on prototypes, and testing them on real speech data.
Intended learning outcomes
Having passed the course, the student shall be able to
- implement methods for training and evaluation of speech recognition systems
- train and evaluate a speech recognizer, using software tools
- compare different methods for feature extraction and training
- document and discuss specific aspects related to recognition of speech and of speakers
- review and criticise other students' work in the subject, based on the literature.
Literature and preparations
Some knowledge of Machine learning, possibly DD2421, DD2434 or EN2202
Some programming knowledge, best if Python
Some knowledge in Signal Processing
Examination and completion
If the course is discontinued, students may request to be examined during the following two academic years.
- LAB1 - Computer Lab, 4.5 credits, grading scale: P, F
- PRO1 - Project, 3.0 credits, grading scale: A, B, C, D, E, FX, F
The examiner may apply another examination format when re-examining individual students.
Other requirements for final grade
Academic paper and its presentation at a final review
Assessment of two other course participants' theses, and critical review of their presentations.
Opportunity to complete the requirements via supplementary examination
Opportunity to raise an approved grade via renewed examination
- All members of a group are responsible for the group's work.
- In any assessment, every student shall honestly disclose any help received and sources used.
- In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.
Further information about the course can be found on the Course web at the link below. Information on the Course web will later be moved to this site.Course web DT2119
Main field of study
The course may be canceled or be given in another form if the number of regular registrations are too few.
In this course, the EECS code of honor applies, see: