Density estimation¶

The problem of estimating the probability density function for a given set of observations.

Empirical distribution function¶

Compute the empirical CDF and numerically differentiate it.

Histogram¶

Take the range of the sample and split it up into n bins, where n is a hyperparameter. Then assign a probability to each bin according to the proportion of the sample that fell within its bounds.

Kernel Density Estimation¶

The predicted density function given an a sample $x$ is:

$\hat{f}(x) = \frac{1}{n}\sum_{i=1}^n K_h(x - x_i)$

Where $K$ is the kernel and $h > 0$ is a smoothing parameter.

$K_h(x) = \frac{1}{h}K\big(\frac{x}{h}\big)$

A variety of kernels can be used. A common one is the Gaussian, defined as:

$K(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} x^2}$

Disadvantages¶

The complexity at inference time is linear in the size of the sample.

Mixture Model¶

Estimates the density as a weighted sum of parametric distributions. The predicted density function for a sample $x$ is:

$\hat{f}(x) = \sum_{i=1}^k w_i \phi(x;\theta_i)$

Where $k$ is the number of distributions and each distribution, $\phi$ , is parameterised by $\theta$ . It is also weighted by a single scalar $w_i$ where $\sum_{i=1}^k w_i = 1$ .

The Gaussian is a common choice for the distribution. In this case the estimator is known as a Gaussian Mixture Model.

All of the parameters can be learnt using Expectation-Maximization, except for $k$ which is a hyperparameter.