Density estimation

The problem of estimating the probability density function for a given set of observations.

Empirical distribution function

Compute the empirical CDF and numerically differentiate it.

Histogram

Take the range of the sample and split it up into n bins, where n is a hyperparameter. Then assign a probability to each bin according to the proportion of the sample that fell within its bounds.

Kernel Density Estimation

The predicted density function given an a sample x is:

\hat{f}(x) = \frac{1}{n}\sum_{i=1}^n K_h(x - x_i)

Where K is the kernel and h > 0 is a smoothing parameter.

K_h(x) = \frac{1}{h}K\big(\frac{x}{h}\big)

A variety of kernels can be used. A common one is the Gaussian, defined as:

K(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} x^2}

Disadvantages

The complexity at inference time is linear in the size of the sample.

Mixture Model

Estimates the density as a weighted sum of parametric distributions. The predicted density function for a sample x is:

\hat{f}(x) = \sum_{i=1}^k w_i \phi(x;\theta_i)

Where k is the number of distributions and each distribution, \phi, is parameterised by \theta. It is also weighted by a single scalar w_i where \sum_{i=1}^k w_i = 1.

The Gaussian is a common choice for the distribution. In this case the estimator is known as a Gaussian Mixture Model.

All of the parameters can be learnt using Expectation-Maximization, except for k which is a hyperparameter.