DD2418 Language Engineering 6.0 credits

Språkteknologi

Please note

The information on this page is based on a course syllabus that is not yet valid.

The course in language technology treats different methods for analysis, generation, and filtering of human language especially text. Rule-based and statistical methods are used and studied in applications such as information retrieval, spelling- and grammar checking, and text summarization. The course covers theory, methods, and application areas within language technology. The course requirements are an examination, laboratory assignments, and a home assignment.
  • Education cycle

    Second cycle
  • Main field of study

    Computer Science and Engineering
    Information Technology
    Information and Communication Technology
  • Grading scale

    A, B, C, D, E, FX, F

Course offerings

Autumn 18 sprakt18 for programme students

Spring 19 spraktv19 for programme students

Intended learning outcomes

Students should after course can:

  • explain and use concepts in the following levels of linguistics: morphology, syntax, semantics, discourse and pragmatics,
  • apply knowledge of morphology, syntax and lexical semantics to develop language engineering systems as well as explain structure of existing systems using these levels
  • clarify the differences between analysis, generation, and filtering in text-based systems
  • use basic tools in language engineering such as part-of-speech taggers, chunkers as well as various types of corpora and dictionary to be able to build own programs,
  • explain and use standard methods in language technology that are based on both rules, statistics and machine learning,
  • practically apply methods that are based on finite automata/transducers, context free grammars, word frequencies, n-grams, co-occurrence statistics, Markov models, and vector space models,
  • analyse and explain which language engineering problems that could be solved with usable results as well as which that lie beyond the research horizon,
  • explain in detail how spell checking, grammar checking, some kind of tagger using machine learning, stemmer and an algorithm for statistical extraction of related words work,
  • design and carry out simple evaluations of some language engineering system as well as interpret the results,
  • independently solve a well-defined practical language technology problem, or analyze a problem theoretically,

to be able to:

  • work for language technology companies
  • continue with language technology oriented studies,
  • carry out a degree project in computer science or human computer interaction with a language engineering specialisation
  • be an important link between systems designers, programmers, and interaction designers in industry as well as in research projects.

Course main content

Theory:

The history and basics of language technology, morphology, syntax, and semantics, vector space models, evaluation methods, the principles and methods of terminology work, machine learning, information theory and Markov models, algorithms and data structures for efficient lexicon handling.

Methods::

Morphological analysis, generation and language statistics and corpus processing, parsing, generation, part-of-speech tagging, named entity recognition, probabilistic parsing and statistical lexical semantics.

Application areas:

Spelling- and grammar checking, information retrieval, word prediction for intelligent text entry, text clustering and text categorization, computer assisted language learning, dialogue systems, text summarization, speech technology, localization and internationalization

Disposition

Eligibility

For non-program students, 90 credits are required, of which 45 credits have to be within mathematics or information technology. Furthermore, Swedish B or the equivalent and English A are required or the equivalent.

Recommended prerequisites

One of the courses DD1320/DD1321 Applied Computer Science, DD1340 Introduction to Computer Science, DD1343 Computer Science and Numerical Methods, part 1, DD1344 Fundamentals of Computer Science, DD1346 Object-Oriented Program Construction plus SF1906 Mathematical Statistics or equivalent. Knowledge of formal languages corresponding to DD2488 Compiler Construction or DD1361 Programming paradigms is useful but not necessary.

Literature

The reading list will be announced on the course webpage no later than 4 weeks before start of the course. The previous course offering used Jurafsky & Martin, Speech and language processing as well as material produced at the department.

Required equipment

Examination

  • INL1 - Hand-ins, 1.5, grading scale: A, B, C, D, E, FX, F
  • LAB2 - Laboratory work, 4.5, grading scale: A, B, C, D, E, FX, F

The course DD1418 has until now shared examination with DD2418. Students who have initiated DD2418 earlier but still not have passed the exam can take the exam in DD1418. This possibility is retained in two years.

Requirements for final grade

For final course grade is required that student has passed the components INL1 and LAB2. The final grade will be a weighted mean of the grades on the two components.

Offered by

EECS/Intelligent Systems

Contact

Johan Boye, e-post: jboye@kth.se

Examiner

Johan Boye <jboye@kth.se>

Supplementary information

 

Add-on studies

Please discuss with the instructor.

DD2476 Search Engines and Information Retrieval Systems, and DT2112 Speech technology are possible follow-ups.

Version

Course syllabus valid from: Spring 2019.
Examination information valid from: Spring 2019.