DD2418 Language Engineering 6.0 credits
The course in language technology treats different methods for analysis, generation, and filtering of human language especially text. Rule-based and statistical methods are used and studied in applications such as information retrieval, spelling- and grammar checking, and text summarization.
The course covers theory, methods, and application areas within language technology. The course requirements are an examination, laboratory assignments, and a home assignment.
Educational levelSecond cycle
Academic level (A-D)
Subject areaComputer Science and Engineering
Information and Communication Technology
Grade scaleA, B, C, D, E, FX, F
Autumn 17 P2 (6.0 credits)
2017 week: 44
2018 week: 3
Language of instruction
Number of lectures
Number of exercises
Form of study
Number of places
P2: A1, F1, I1, A2, I2. more info
Johan Boye <firstname.lastname@example.org>
Viggo Kann <email@example.com>
Searchable for all students from year 3 and for students admitted to a master programme.
Part of programme
- Master of Science in Engineering and in Education, year 4, TIKT, Conditionally Elective
- Master's Programme, Computer Science, 120 credits, year 1, CSCS, Recommended
- Master's Programme, Computer Science, 120 credits, year 1, CSDA, Conditionally Elective
- Master's Programme, Computer Science, 120 credits, year 2, CSCS, Recommended
- Master's Programme, Computer Science, 120 credits, year 2, CSDA, Conditionally Elective
- Master's Programme, Machine Learning, 120 credits, year 1, Conditionally Elective
- Master's Programme, Machine Learning, 120 credits, year 2, Conditionally Elective
- Master's Programme, Systems, Control and Robotics, 120 credits, year 2, Recommended
Intended learning outcomes
The students should after the course have the knowledge to:
- explain and use general concepts within the following levels of linguistics: morphology, syntax, semantics, discourse, and pragmatics,
- use the knowledge about morphology, syntax, and lexical semantics in order to develop systems, and explain existing systems using these levels,
- clarify the differences between analysis, generation, and filtering in text-based systems,
- use general language technology tools and resources, such as part-of-speech taggers, chunkers, corpora, and lexica in order to build new applications,
- explain and use standard methods based on rules, statistics, and machine learning,
- apply methods based on finite automata/transducers, context-free grammars, word frequencies, n-grams, co-occurrence statistics, Markov models, and vector space models,
- analyze and explain which problems within language technology that could be solved with usable results, and which could not be solved,
- give details of how spelling- and grammar checkers, taggers based on machine learning, stemmers, and an algorithm for semantic content acquisition work,
- design and carry out a simpler evaluation of a language technology system, and interpret the results,
- independently solve a well-defined practical language technology problem, or analyze a problem theoretically,
to be able to:
- work for a language technology company,
- continue with studies in language technology,
- work with a master’s project in computer science or human-computer interaction with a focus on language technology,
- be an important link between systems designers, programmers, and interaction designers in industry as well as in research projects.
Course main content
The history and basics of language technology, morphology, syntax, and semantics, vector space models, evaluation methods, the principles and methods of terminology work, machine learning, information theory and Markov models, algorithms and data structures for efficient lexicon handling.
Morphological analysis and generation, statistical methods in corpus linguistics, parsing and generation, part-of-speech tagging, named entity recognition and probabilistic parsing, statistical lexical semantics.
Spelling- and grammar checking, information retrieval, word prediction for smart text entry, text clustering and text categorization, computer assisted language learning, dialogue systems, text summarization, speech technology, localization and internationalization.
Single course students: 90 university credits including 45 university credits in Mathematics or Information Technology. Swedish B or equivalent and English A or equivalent.
One of the courses DD1320/DD1321 Applied Computer Science, DD1340 Introduction to Computer Science, DD1343 Computer Science and Numerical Methods, part 1, DD1344 Fundamentals of Computer Science, DD1346 Object-Oriented Program Construction plus SF1906 Mathematical Statistics or equivalent. Knowledge of formal languages corresponding to DD2488 Compiler Construction or DD1361 Programming paradigms is useful but not necessary.
Kurslitteratur meddelas på kursens hemsidan senast 4 veckor före kursstart. Föregående kursomgång användes Jurafsky & Martin, Speech and language processing samt material producerat vid institutionen.
- INL1 - Hand-ins, 1.5, grade scale: A, B, C, D, E, FX, F
- LAB1 - Laboratory Work, 1.5, grade scale: P, F
- TEN1 - Examination, 3.0, grade scale: A, B, C, D, E, FX, F
In this course all the regulations of the code of honor at the School of Computer science and Communication apply, see: http://www.kth.se/csc/student/hederskodex/1.17237?l=en_UK.
CSC/Speech, Music and Hearing
Johan Boye, e-post: firstname.lastname@example.org
Johan Boye <email@example.com>
Please discuss with the instructor.
DD2476 Search Engines and Information Retrieval Systems, and DT2112 Speech technology are possible follow-ups.
Course syllabus valid from: Autumn 2012.
Examination information valid from: Autumn 2012.