Keywords to remember in the course

[Warning: this is for a previous year's syllabus and not really relevant anymore.]

This is not a complete list,

1. K-nearest neighbour, average vote for K nearest training samples.

2. Decisions trees: Entropy = unpredictability = sum -p log p. Maximize information (= - entropy) gain for every node. (gini = - sum p(1-p). Crossvalidate and prune the last nodes.

3. Bayesian Inference: The process of calculating the posterior probability distribution P(y | x) for certain data x.

4. Bayesian Learning: The process of learning the likelihood distribution P(x | y) and prior probability distribution P(y) from a set of training points.

5. Likelihood: P(x|y)

6. a posterori: P(y|x) proportional to P(x|y) P(y)

7. Boosting: Aggregating many weak classifiers. Adaboost: alpha = log((1-eps)/eps). Weight misclassified samples prop to exp(alpha)/exp(-alpha), then use alpha weighted average of classfiiers.

8. Bagging: Bootstrap new samples, trains classfiers and averge them.

9. Concept: c true/false labeling of x in X

10. Hypothesis space: All possible true/false concepts, h in H (before data arrives)

11. True error of hypothesis: probability that hypothesis h gives wrong classification for one datapoint.

12. Probably Approximately Correct: How many training samples are needed if we want probability that any hypothesis missclassifies possible data to be less than delta = H*(1-eps)^m

13. VC dimension: The largest set of data for which each subset can be described by a hypothesis h in H.

14. Naive Bayes: Assume features are not dependant, maximise aposterori (not liklihood)

15. Logistic regression: Regression to a probability.

16. Discriminative vs Generative model: Discriminative models posterior directly. Generative models p(x|y) and p(y).

17. Baysian approach: Get P(theta|x) and integrate over hypothesis space to get P(y|x).