Statistics

Arithmetic mean

The arithmetic mean of a set of inputs \{x_1,x_2,...,x_n\} is:

A(x_1,x_2,...,x_n) = \frac{1}{n}\sum_{i=1}^n x_i

Correlation

The correlation between two random variables X and Y is:

\text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{V(X)V(Y)}}

Covariance

The covariance between two random variables X and Y is defined as:

\text{Cov}(X,Y) = \frac{1}{n}\sum_{i=1}^n (x_i - \mu_x)(y_i - \mu_y)

Covariance matrix

A square matrix \Sigma where \Sigma_{ij} = Cov(X_i,X_j) and X_i and X_j are two variables.

There are three types of covariance matrix:

  • Full - All entries are specified. Has O(n^2) parameters for n variables.
  • Diagonal - The matrix is diagonal, meaning all off-diagonal entries are zero. Variances can differ across dimensions but there is no interplay between the dimensions. Has O(n) parameters.
  • Spherical - The matrix is equal to the identity matrix multiplied by a constant. This means the variance is the same in all dimensions. Has O(1) parameters.

A valid covariance matrix is always symmetric and positive semi-definite.

Geometric mean

The geometric mean of a set of inputs \{x_1,x_2,...,x_n\} is:

G(x_1,x_2,...,x_n) = \sqrt[\leftroot{-2}\uproot{2}n]{x_1x_2...x_n}

Only applicable to positive numbers since otherwise it may involve taking the root of a negative number.

Harmonic mean

The harmonic mean for a set of inputs \{x_1,x_2,...,x_n\} is:

H(x_1,x_2,...,x_n) = n/\sum_{i=1}^n \frac{1}{x_i}

Cannot be computed if one of the numbers is zero since that would necessitate dividing by zero.

Used for the F1-score, which is the Harmonic mean of the precision and recall.

Heteroscedasticity

When the error of a model is correlated with one or more of the features.

Moments

Moving average

A moving average smooths a sequence of observations.

Exponential moving average (EMA)

A type of moving average in which the influence of past observations on the current average diminishes exponentially with time.

m_t = \alpha m_{t-1} + (1 - \alpha) x_t

m_t is the moving average at time t, x_t is the input at time t and 0 < \alpha < 1 is a hyperparameter. As \alpha decreases, the moving average weights recent observations more strongly.

Bias correction

If we initialise the EMA to equal zero (m_0 = 0) it will be very biased towards zero around the start. To correct this we can start with \alpha being close to 0 and gradually increase it. This effect can be achieved by rewriting the formula as:

m_t = \frac{1}{1 - \alpha^t}(\alpha m_{t-1} + (1 - \alpha) x_t)

See Adam: A Method for Stochastic Optimization, Kingma et al. (2015) for an example of this bias correction being used in practice.

Point estimate

An estimate for a parameter, such as the mean of a population for example. It describes the belief about this quantity with a single number, in contrast with a distribution which could be used to describe the belief for the parameter with multiple numbers.

Skewness

Measures the asymmetry of a probability distribution.

= E\bigg[\bigg(\frac{X - \mu}{\sigma}\bigg)^3\bigg]

Standard deviation

The square root of the variance. The formula is:

\sigma = \sqrt{E[(X-\mu)^2]}

where \mu is the mean of X.

Sample standard deviation

s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n(x_i-\mu)^2}

Note that the above is the biased estimator for the sample standard deviation. Estimators which are unbiased exist but they each only apply to some population distributions.

Variance

The variance of X=\{x_1, ..., x_n\} is:

V(X) = E[(X-\mu)^2]

where \mu is the mean of X.

The formula can also be written as:

V(X) = \frac{1}{n}\sum_{i=1}^n (x_i - \mu)^2

Sample variance

When it is impractical to compute the variance over the entire population, we can take a sample instead and compute the sample variance. The formula for the unbiased sample variance is:

V(X) = \frac{1}{n-1}\sum_{i=1}^n (x_i - \mu)^2