# SVMs¶

Support Vector Machines.

Binary classifier. Their objective is to find a hyperplane that optimally separates the two classes (maximises the margin).

## Hard margin¶

Can be used when the data is linearly separable.

The decision function for a linear hard-margin SVM is:

All positive examples should have and all negative examples should have .

## Soft-margin¶

Necessary when the data is not linearly separable.

The loss function for a linear soft-margin SVM is:

Where and are parameters to be learnt and is a hyperparameter.

## Dual form¶

## Primal form¶

## Training¶

Quadratic programming

## Kernels¶

The kernel is used to map the data into a high-dimensional space in which it is easier to separate it linearly. This is known as the **kernel trick**.

### Linear¶

### Polynomial¶

### Sigmoid¶

### RBF¶

## Advantages¶

- The optimisation problem is convex so local optima are not a problem.

## Disadvantages¶

- Cannot naturally learn multiclass classification problems. Applying an SVM to these requires reformulating the problem as a series of binary classification tasks, either one-vs-all or one-vs-one tasks. Learning these separately is inefficient and poor for generalisation.