Skip to main content

Before choosing course

Automatic Speech Recognition (ASR) is concerned with the problem of transcribing spoken words and phrases into text. The ASR functionality is usually integrated into a larger system that makes it possible for humans to interact with computers using natural language. From a technical point of view, the ASR problem poses a number of challenges, emerging from the need to deal with real life signals produced by different individuals and in different conditions. The solutions are usually based on statistical modeling and machine learning. This course gives insights into the signal processing and statistical methods employed in ASR and in Speaker identification.

Course offering missing for current semester as well as for previous and coming semesters
* Retrieved from Course syllabus FDT3119 (Spring 2019–)

Content and learning outcomes

Course contents

The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing a thesis in a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, train the system and evaluate its performance. 

The following theoretical components are included: 

  • algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs) 
  • methods to decrease the sensitivity against disturbances and deviations 
  • probability theory 
  • signal processing and parameter extraction 
  • acoustic modelling of the static and dynamic spectral properties of the speech sounds statistical modelling of language in spontaneous and formal speech 
  • search strategies- basic methods and strategies for large vocabularies 
  • specific analysis and decision making methods for recognition of speakers.

Furthermore, certain practical insight to build an application is given. Here, implementing certain functions based on prototypes and testing them on real speech data are included. 

Intended learning outcomes

After completion of the course the students should be able to:

  • implement training and evaluation methods for speech recognition
  • train and evaluate a speech recogniser using software packages
  • compare different feature extraction and training methods
  • document and discuss specific aspects related to speech and speaker recognition
  • with the help of the literature, review and criticise other students' work in the subject

Course Disposition

12 lectures, 3 labs, final project 

Literature and preparations

Specific prerequisites

Doctoral students from EECS 

Recommended prerequisites

Some knowledge of Machine learning, possibly DD2421, DD2434 or EN2202 

Some programming knowledge, best if Python

Some knowledge in Signal Processing

Equipment

No information inserted

Literature

Huang, X., Acero, A., Hon, H.-W. Spoken Language Processing - A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001. 

Automatic Speech Recognition: A deep learning approach, Dong Yu and Li Deng, Springer 2015. You can download the PDF through KTH Library. 

Research articles in speech recognition 

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale

P, F

Examination

  • EXA1 - Exam, 7,5 hp, betygsskala: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

LABl - Computer Lab, 4.5, grading scale: P, F

PROl - Project, 3.0, grading scale: P, F 

Other requirements for final grade

Laboratory exercises with oral presentation

Research project with written report 

Opportunity to complete the requirements via supplementary examination

No information inserted

Opportunity to raise an approved grade via renewed examination

No information inserted

Examiner

Profile picture Giampiero Salvi

Ethical approach

  • All members of a group are responsible for the group's work.
  • In any assessment, every student shall honestly disclose any help received and sources used.
  • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course web

Further information about the course can be found on the Course web at the link below. Information on the Course web will later be moved to this site.

Course web FDT3119

Offered by

EECS/Speech, Music and Hearing

Main field of study

No information inserted

Education cycle

Third cycle

Add-on studies

No information inserted

Supplementary information

The course is run in parallel with DT2119. The PhD version of the course requires a larger research project to be agreed with the course responsible.

Postgraduate course

Postgraduate courses at EECS/Speech, Music and Hearing