SVMs¶
Support Vector Machines.
Binary classifier. Their objective is to find a hyperplane that optimally separates the two classes (maximises the margin).
Hard margin¶
Can be used when the data is linearly separable.
The decision function for a linear hard-margin SVM is:
All positive examples should have and all negative examples should have .
Soft-margin¶
Necessary when the data is not linearly separable.
The loss function for a linear soft-margin SVM is:
Where and are parameters to be learnt and is a hyperparameter.
Dual form¶
Primal form¶
Training¶
Quadratic programming
Kernels¶
The kernel is used to map the data into a high-dimensional space in which it is easier to separate it linearly. This is known as the kernel trick.
Linear¶
Polynomial¶
Sigmoid¶
RBF¶
Advantages¶
- The optimisation problem is convex so local optima are not a problem.
Disadvantages¶
- Cannot naturally learn multiclass classification problems. Applying an SVM to these requires reformulating the problem as a series of binary classification tasks, either one-vs-all or one-vs-one tasks. Learning these separately is inefficient and poor for generalisation.