FDD3342 Knowledge Discovery and Data Mining 6.0 credits

Databrytning

Proliferation of IT and automatic surveillance applications means that vast volumes of data are gathered for operations and other purposes in
computer applications and data bases.  In business there is an increasing awareness that data collected constitute an untapped resource of business knowledge, that can be used to commercial
advantage. In sciences, data collected in large scale experiments have rapidly outgrown the communities' resources to analyse data. The method of using advanced 'semiintelligent' systems to systematically 'mine' data repositories and create knowledge is known as knowledge discovery and data mining.
 
In this course we introduce the main methods used in the area, as supported by techniques originating in the Data Base, Artificial Intelligence, Statistics and Visualization fields. We particularly cover the theoretical and practical problems in identifying 'true and interesting' knowledge as opposed to erroneous, random and uninteresting knowledge, a problem much studied in statistics and data base practice.

Offering and execution

Course offering missing for current semester as well as for previous and coming semesters

Course information

Content and learning outcomes

Course contents *

KDD philosophy
Bayes rule and its interpretation as inference tool
Learnability, VC-dimension 
Statistical Techniques: MV analysis, SVD technique, etc.
Classification and clustering
Bayesian networks and graphical models
Prediction and sequence mining
Markov Chain Monte Carlo methods
Support vector and kernel methods

Intended learning outcomes *

After passing this course, you will:

  • know the fundamental approaches to knowledge discovery and data mining, the main theoretical foundations, as well as its code of practice,
  • know about several tools in the area and be able to use at least one,
  • be able to follow research and development in the area,
  • be able to assess the applicability of the technology for a particular scientific problem area, and develop the scientific methods used.

Course Disposition

No information inserted

Literature and preparations

Specific prerequisites *

No information inserted

Recommended prerequisites

Student or doctoral student with first courses passed in programming and statistics.

Equipment

No information inserted

Literature

Course compendium and research articles.

Examination and completion

If the course is discontinued, students may request to be examined during the following two academic years.

Grading scale *

P, F

Examination *

    Based on recommendation from KTH’s coordinator for disabilities, the examiner will decide how to adapt an examination for students with documented disability.

    The examiner may apply another examination format when re-examining individual students.

    Examination is individual, and can consist of presentations of texts, presentation in class, homeworks and/or a small project.
    A list of papers you read for the course, preferrably with comments, and proposals for course improvement should be turned in.

    Opportunity to complete the requirements via supplementary examination

    No information inserted

    Opportunity to raise an approved grade via renewed examination

    No information inserted

    Examiner

    Jens Lagergren

    Ethical approach *

    • All members of a group are responsible for the group's work.
    • In any assessment, every student shall honestly disclose any help received and sources used.
    • In an oral assessment, every student shall be able to present and answer questions about the entire assignment and solution.

    Further information

    Course web

    No information inserted

    Offered by

    EECS/Computational Science and Technology

    Main field of study *

    No information inserted

    Education cycle *

    Third cycle

    Add-on studies

    No information inserted

    Contact

    Jens Lagergren, e-post: jensl@kth.se, telefon: 55378570

    Supplementary information

    The course can be taken as a reading course, but is also co-lectured with DD2447 Statistical Methods in Computer Science.

    Postgraduate course

    Postgraduate courses at EECS/Computational Science and Technology