Syntax and semantics for programming languages that are particularly suited for data science, e.g., Python.
Routines to import, combine, convert and make selection of data.
Algorithms for handling of missing values, discretisation and dimensionality reduction.
Algorithms for supervised machine learning, e.g., naïve Bayes, decision trees, random forests.
Algorithms for unsupervised machine learning, e.g., k-means clustering.
Libraries for data analysis.
Evaluation methods and performance metrics.
Visualisation and analysis of results of data analysis.
Having passed the course, the student should be able to
- account for and discuss the application of i) technologies to convert data to an appropriate format for data analysis ii) algorithms to analyse data through supervised and unsupervised machine learning as well as iii) technologies and performance metrics for evaluation of data analysis results
- implement and apply i) technologies to convert data to an appropriate format for data analysis ii) algorithms for supervised and unsupervised machine learning as well as iii) technologies and performance metrics for evaluation of data analysis results.