FDT3119 Speech and Speaker Recognition 7.5 credits

Automatic Speech Recognition (ASR) is concerned with the problem of transcribing spoken words and phrases into text. The ASR functionality is usually integrated into a larger system that makes it possible for humans to interact with computers using natural language. From a technical point of view, the ASR problem poses a number of challenges, emerging from the need to deal with real life signals produced by different individuals and in different conditions. The solutions are usually based on statistical modeling and machine learning. This course gives insights into the signal processing and statistical methods employed in ASR and in Speaker identification.

Information per course offering

Termin

Spring 2026

Information for Spring 2026 Start 16 Mar 2026 programme students

Course location: KTH Campus
Duration: 16 Mar 2026 - 1 Jun 2026
Periods: Spring 2026: P4 (7.5 hp)
Pace of study: 50%
Application code: 11046
Form of study: Normal Daytime
Language of instruction: English
Course memo: Course memo is not published
Number of places: Places are not limited
Target group: No information inserted
Planned modular schedule: [object Object]
Schedule: Link to schedule
Part of programme: No information inserted

Contact

Examiner

No information inserted

Course coordinator

No information inserted

Teachers

No information inserted

Course syllabus as PDF

Please note: all information from the Course syllabus is available on this page in an accessible format.

Course syllabus FDT3119 (Spring 2019–)

Headings with content from the Course syllabus FDT3119 (Spring 2019–) are denoted with an asterisk ( )

Content and learning outcomes

Course contents

The course consists of lectures, three laboratory sessions with hand-in assignments, as well as writing a thesis in a subject chosen in consultation with the teacher. The thesis is furthermore presented orally during a final seminar. The laboratory sessions consist of designing different parts of a speech recognition application, train the system and evaluate its performance.

The following theoretical components are included:

algorithms for training, recognition as well as adaptation to properties of speakers and transmissions channel, including pattern recognition, Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs)
methods to decrease the sensitivity against disturbances and deviations
probability theory
signal processing and parameter extraction
acoustic modelling of the static and dynamic spectral properties of the speech sounds statistical modelling of language in spontaneous and formal speech
search strategies- basic methods and strategies for large vocabularies
specific analysis and decision making methods for recognition of speakers.

Furthermore, certain practical insight to build an application is given. Here, implementing certain functions based on prototypes and testing them on real speech data are included.

Intended learning outcomes

After completion of the course the students should be able to:

implement training and evaluation methods for speech recognition
train and evaluate a speech recogniser using software packages
compare different feature extraction and training methods
document and discuss specific aspects related to speech and speaker recognition
with the help of the literature, review and criticise other students' work in the subject

Literature and preparations

Specific prerequisites

Doctoral students from EECS

Recommended prerequisites

Some knowledge of Machine learning, possibly DD2421, DD2434 or EN2202

Some programming knowledge, best if Python

Some knowledge in Signal Processing

Literature

Huang, X., Acero, A., Hon, H.-W. Spoken Language Processing - A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001.

Automatic Speech Recognition: A deep learning approach, Dong Yu and Li Deng, Springer 2015. You can download the PDF through KTH Library.

Research articles in speech recognition

Examination and completion

Grading scale

P, F

Examination

EXA1 - Exam, 7.5 credits, grading scale: P, F

Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

The examiner may apply another examination format when re-examining individual students.

If the course is discontinued, students may request to be examined during the following two academic years.

LABl - Computer Lab, 4.5, grading scale: P, F

PROl - Project, 3.0, grading scale: P, F

Other requirements for final grade

Laboratory exercises with oral presentation

Research project with written report

Examiner

Jonas Beskow

Ethical approach

All members of a group are responsible for the group's work.
In any assessment, every student shall honestly disclose any help received and sources used.
In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

Further information

Course room in Canvas

Registered students find further information about the implementation of the course in the course room in Canvas. A link to the course room can be found under the tab Studies in the Personal menu at the start of the course.

Offered by

EECS/Speech, Music and Hearing

Main field of study

This course does not belong to any Main field of study.

Education cycle

Third cycle

Supplementary information

The course is run in parallel with DT2119. The PhD version of the course requires a larger research project to be agreed with the course responsible.

Postgraduate course

Postgraduate courses at EECS/Speech, Music and Hearing

Studies

Support and guidance

IT and digital services

Contact

FDT3119 Speech and Speaker Recognition 7.5 credits

Information per course offering

Information for Spring 2026 Start 16 Mar 2026 programme students

Contact

Course syllabus as PDF

Content and learning outcomes

Course contents

Intended learning outcomes

Literature and preparations

Specific prerequisites

Recommended prerequisites

Literature

Examination and completion

Grading scale

Examination

Other requirements for final grade

Examiner

Ethical approach

Further information

Course room in Canvas

Offered by

Main field of study

Education cycle

Supplementary information

Postgraduate course