Igenkänning av tal och talare

Logga in till din kurswebb

Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.

Kurswebben har varit stängt för redigering sedan första juni 2025 och kommer stängas ned helt första oktober 2026. Utifrån hur kurswebben har använts finns det några olika alternativ för ersättare:

Sidan "Inför kursval" i Om kursen
Kurs-PM (i Om kursen)
Publikt utrymme i Canvas

Är du intresserad av mer information om detta, kontakta e-learning@kth.se.

NEW in VT2020:

most of the lectures will be given through live streaming
Canvas URL: https://kth.instructure.com/courses/17109

NEW in VT2019:

clearer grading criteria
updates and corrections in the labs
Canvas URL: https://kth.instructure.com/courses/7539

NEW in VT2018:

Lab 3 has been redesigned to use Python and ThensorFlow
Lab 1 and 2 have been updated
Canvas URL: https://kth.instructure.com/courses/5254

NEW in VT2017:

The course code has been changed to DT2119
The new course has final grades on a A-F scale
The course activities and material are available on Canvas: https://kth.instructure.com/courses/1730

NEW in VT2016: the course has been redesigned with:

third lab updated with Deep Learning for Speech Recognition
collaboration with PDC for computing resources

NEW in VT2015: the course has been redesigned with:

three new laboratory exercises
new lecture on Deep Learning for Speech Recognition

Automatic Speech Recognition (ASR) is concerned with the problem of transcribing spoken words and phrases into text. The ASR functionality is usually integrated into a larger system that makes it possible for humans to interact with computers using natural language. From a technical point of view, the ASR problem poses a number of challenges, emerging from the need to deal with real life signals produced by different individuals and in different conditions. The solutions are usually based on statistical modeling and machine learning.

This course gives insights into the signal processing and statistical methods employed in ASR and in Speaker identification.

Topics

Speech recognition, speech production, speech analysis, features, statistical modeling of sequences, hidden Markow models, deep neural networks, search algorithms, language models, speaker identification.

PhD Students

The course can be also taken at the doctoral level with course number FDT3119. The extra requirements for doctoral level credits will be discussed on a individual basis. Please contact doctoral-education-support@eecs.kth.se to subscribe to the course.

Anmäl missbruk