Gaussian processes

Gaussian processes model a probability distribution over functions.

Let f(x) be some function mapping vectors to vectors. Then we can write:

f(x) \sim GP(m(x),k(x,x'))

where m(x) represents the mean vector:

m(x) = \mathbb{E}[f(x)]

and k(x,x') is the kernel function.

Kernel function

The kernel is a function that represents the covariance function for the Gaussian process.

k(x,x') = \text{Cov}(f(x),f(x'))

k(x,x') = \mathbb{E}[(f(x) - m(x))(f(x') - m(x'))^T]

The kernel can be thought of as a prior for the shape of the function, encoding our expectations for the amount of smoothness or non-linearity.

Not all conceivable kernels are valid. The kernel must produce covariance matrices that are positive-definite.

Linear kernel

k(x,x') = x \cdot x'

Some functions sampled from a Gaussian process with a linear kernel:

_images/linear.png

Polynomial kernel

k(x,x') = (x \cdot x' + a)^b

Functions sampled from a Gaussian process with a polynomial kernel where a=1 and b=2:

_images/polynomial_2.png

Gaussian kernel

k(x,x') = \exp({{-||x - x'||}_2^2})

Some functions sampled from a GP with a Gaussian kernel:

_images/gaussian.png

Laplacian kernel

k(x,x') = \exp({{-||x - x'||}_2})

Functions sampled from a GP with a Laplacian kernel:

_images/laplace.png

Sampling

Pseudocode to sample from a Gaussian process:

  1. Decide on a vector of inputs x for which we want to compute f(x), where f has been sampled form the Gaussian process.
  2. Compute K = k(x,x).
  3. Perform Cholesky decomposition on K, yielding a lower triangular matrix L.
  4. Sample a vector of numbers from a standard Gaussian distribution.
  5. Take the dot product of L and the vector of points to get the samples for f(x).