Skip to main content
Till KTH:s startsida Till KTH:s startsida

FDT3119 Speech and Speaker Recognition 7.5 credits

Automatic Speech Recognition (ASR) is concerned with the problem of transcribing spoken words and phrases into text. The ASR functionality is usually integrated into a larger system that makes it possible for humans to interact with computers using natural language. From a technical point of view, the ASR problem poses a number of challenges, emerging from the need to deal with real life signals produced by different individuals and in different conditions. The solutions are usually based on statistical modeling and machine learning. This course gives insights into the signal processing and statistical methods employed in ASR and in Speaker identification.

Choose semester and course offering

Choose semester and course offering to see current information and more about the course, such as course syllabus, study period, and application information.

Application

For course offering

Spring 2024 Start 18 Mar 2024 programme students

Application code

61041

Headings with content from the Course syllabus FDT3119 (Spring 2019–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing a thesis in a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, train the system and evaluate its performance. 

The following theoretical components are included: 

  • algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs) 
  • methods to decrease the sensitivity against disturbances and deviations 
  • probability theory 
  • signal processing and parameter extraction 
  • acoustic modelling of the static and dynamic spectral properties of the speech sounds statistical modelling of language in spontaneous and formal speech 
  • search strategies- basic methods and strategies for large vocabularies 
  • specific analysis and decision making methods for recognition of speakers.

Furthermore, certain practical insight to build an application is given. Here, implementing certain functions based on prototypes and testing them on real speech data are included. 

Intended learning outcomes

After completion of the course the students should be able to:

  • implement training and evaluation methods for speech recognition
  • train and evaluate a speech recogniser using software packages
  • compare different feature extraction and training methods
  • document and discuss specific aspects related to speech and speaker recognition
  • with the help of the literature, review and criticise other students' work in the subject

Literature and preparations

Specific prerequisites

Doctoral students from EECS 

Recommended prerequisites

Some knowledge of Machine learning, possibly DD2421, DD2434 or EN2202 

Some programming knowledge, best if Python

Some knowledge in Signal Processing

Equipment

No information inserted

Literature

Huang, X., Acero, A., Hon, H.-W. Spoken Language Processing - A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001. 

Automatic Speech Recognition: A deep learning approach, Dong Yu and Li Deng, Springer 2015. You can download the PDF through KTH Library. 

Research articles in speech recognition 

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

P, F

Examination

  • EXA1 - Exam, 7.5 credits, grading scale: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

LABl - Computer Lab, 4.5, grading scale: P, F

PROl - Project, 3.0, grading scale: P, F 

Other requirements for final grade

Laboratory exercises with oral presentation

Research project with written report 

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted

Examiner

Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

Main field of study

This course does not belong to any Main field of study.

Education cycle

Third cycle

Add-on studies

No information inserted

Supplementary information

The course is run in parallel with DT2119. The PhD version of the course requires a larger research project to be agreed with the course responsible.

Postgraduate course

Postgraduate courses at EECS/Speech, Music and Hearing