Igenkänning av tal och talare

Logga in till din kurswebb

Du är inte inloggad på KTH så innehållet är inte anpassat efter dina val.

Speak Now

NEW in VT2020:

NEW in VT2019:

NEW in VT2018:

NEW in VT2017:

NEW in VT2016: the course has been redesigned with:

  • third lab updated with Deep Learning for Speech Recognition
  • collaboration with PDC for computing resources

NEW in VT2015: the course has been redesigned with:

  • three new laboratory exercises
  • new lecture on Deep Learning for Speech Recognition

Automatic Speech Recognition (ASR) is concerned with the problem of transcribing spoken words and phrases into text. The ASR functionality is usually integrated into a larger system that makes it possible for humans to interact with computers using natural language. From a technical point of view, the ASR problem poses a number of challenges, emerging from the need to deal with real life signals produced by different individuals and in different conditions. The solutions are usually based on statistical modeling and machine learning.

This course gives insights into the signal processing and statistical methods employed in ASR and in Speaker identification.

Topics

Speech recognition, speech production, speech analysis, features, statistical modeling of sequences, hidden Markow models, deep neural networks, search algorithms, language models, speaker identification.

PhD Students

The course can be also taken at the doctoral level with course number FDT3119. The extra requirements for doctoral level credits will be discussed on a individual basis. Please contact doctoral-education-support@eecs.kth.se to subscribe to the course.

Lärare

Feedback Nyheter