Support Vector Machines.
Binary classifier. Their objective is to find a hyperplane that optimally separates the two classes (maximises the margin).
Can be used when the data is linearly separable.
The decision function for a linear hard-margin SVM is:
All positive examples should have and all negative examples should have .
Necessary when the data is not linearly separable.
The loss function for a linear soft-margin SVM is:
Where and are parameters to be learnt and is a hyperparameter.
The kernel is used to map the data into a high-dimensional space in which it is easier to separate it linearly. This is known as the kernel trick.
- The optimisation problem is convex so local optima are not a problem.
- Cannot naturally learn multiclass classification problems. Applying an SVM to these requires reformulating the problem as a series of binary classification tasks, either one-vs-all or one-vs-one tasks. Learning these separately is inefficient and poor for generalisation.