DT2119 Speech and Speaker Recognition 7.5 credits

Igenkänning av tal och talare

The course objective is to provide a systematic introduction to speech processing and recognition. Models of speech production and speech analysis will form a basis to understanding the problem of speech recognition. Probabilistic machine learning methods will be employed for the recognition task, including Hidden Markov Models, Gaussian Mixture Models, Support Vector Machines, Deep Neural Networks.

  • Education cycle

    Second cycle
  • Main field of study

    Computer Science and Engineering
  • Grading scale

    A, B, C, D, E, FX, F

Course offerings

Information for research students about course offerings

PhD students can take the doctoral course with code 2F5118. A more extensive project work is required compared with DT2119

Intended learning outcomes

After completion of the course, the student should be able to

  • use the, in the course described, methods to recognise speech or speakers
  • configure a system to a given application
  • adapt and develop existing systems for speech and speaker recognition
  • evaluate systems for speech and speaker recognition
  • carry out research in the area.

Course main content

The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing a thesis in a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, train the system and evaluate its performance.

The following theoretical components are included:

  • algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs)
  • methods to decrease the sensitivity against disturbances and deviations
  • probability theory
  • signal processing and parameter extraction
  • acoustic modelling of the static and dynamic spectral properties of the speech sounds
  • statistical modelling of language in spontaneous and formal speech
  • search strategies- basic methods and strategies for large vocabularies
  • specific analysis and decision making methods for recognition of speakers.

Furthermore, certain practical insight to build an application is given. Here, implementing certain functions based on prototypes and testing them on real speech data are included.


For non-program students, 90 credits are required, of which 45 credits should be in mathematics or computer science. Furthermore, English B or the equivalent is required.

Recommended prerequisites

Some knowledge of Machine learning, possibly DD2421, DD2434 or EN2202

Some programming knowledge, best if Python

Some knowledge in Signal Processing


  • Huang, X., Acero, A., Hon, H.-W. Spoken Language Processing – A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001.
  • Automatic Speech Recognition: A deep learning approach, Dong Yu and Li Deng, Springer 2015. You can download the PDF through KTH Library.
  • Research articles in speech recognition


  • LAB1 - Computer Lab, 4.5, grading scale: P, F
  • PRO1 - Project, 3.0, grading scale: A, B, C, D, E, FX, F

Requirements for final grade

Laboratory exercises Written assignments Thesis with presentation at a final review Assessment of two other course participants' theses and critical review on their presentations.

Offered by

EECS/Intelligent Systems


Giampiero Salvi, tel: 790 7894, e-post: giampi@kth.se


Giampiero Salvi <giampi@kth.se>

Supplementary information

The course may be canceled or be given in another form if the number of regular registrations are too few.


Course syllabus valid from: Spring 2019.
Examination information valid from: Spring 2019.