Hyperparameter optimization

A hyperparameter is a parameter of the model which is set according to the design of the model rather than learnt through the training process. Examples of hyperparameters include the learning rate, the dropout rate and the number of layers. Since they cannot be learnt by gradient descent hyperparameter optimization is a difficult problem.

Gaussian processes

Gaussian processes are used to model the function we are trying to optimise.

Bayesian optimization

Note that much of the below explanation references states. These are irrelevant for hyperparameter optimisation since each training run is initialized in the same way.

Acquisition function

A function that decides the next point to sample while trying to maximize the cumulative reward, balancing exploration and exploitation. They are useful not just in hyperparameter optimization but also in reinforcement learning.


Probability of Improvement

Pick the action which maximises the chance of getting to a state with a value greater than the current best state. The reward is 1 if the new state is better and 0 otherwise. This means that it will eschew possible large improvements in favour of more certain small ones.

If all nearby states are known to be worse this strategy can lead to getting stuck in local optima.

Expected Improvement

Pick the action which maximises the expected improvement of that new state over the current best state. The reward is the difference between the values if the new state is better than the old one and zero otherwise.

A higher expected improvement can be obtained either by increasing either the variance or the mean of the value distribution of the next state.

Upper Confidence Bound

Calculate the upper bound of the confidence interval for the rewards from each action in a given state. Pick the action for which the upper bound of the reward is greatest. This will lead to actions with greater uncertainty being chosen since their confidence interval will be larger.

Using a Gaussian distribution gives a simple expression for the bound, that it is \beta standard deviations away from the mean of the distribution of rewards given an action a in some state s:

UCB_{s,a} = \mu_{s,a} + \beta \sigma_{s,a}


k-fold cross validation

1. Randomly split the dataset into K equally sized parts
2. For i = 1,...,K
3.     Train the model on every part apart from part i
4.     Evaluate the model on part i
5. Report the average of the K accuracy statistics

Reinforcement learning

Hyperparameter optimisation can be framed as a problem for reinforcement learning by letting the accuracy on the validation set be the reward and training with a standard algorithm like REINFORCE.