# Statistics¶

## Covariance¶

The covariance between two random variables and is defined as:

### Covariance matrix¶

A square matrix where and and are two variables.

There are three types of covariance matrix:

- Full - All entries are specified. Has parameters for variables.
- Diagonal - The matrix is diagonal, meaning all off-diagonal entries are zero. Variances can differ across dimensions but there is no interplay between the dimensions. Has parameters.
- Spherical - The matrix is equal to the identity matrix multiplied by a constant. This means the variance is the same in all dimensions. Has parameters.

A valid covariance matrix is always symmetric and positive semi-definite.

## Geometric mean¶

The geometric mean of a set of inputs is:

Only applicable to positive numbers since otherwise it may involve taking the root of a negative number.

## Harmonic mean¶

The harmonic mean for a set of inputs is:

Cannot be computed if one of the numbers is zero since that would necessitate dividing by zero.

Used for the F1-score, which is the Harmonic mean of the precision and recall.

## Heteroscedasticity¶

When the error of a model is correlated with one or more of the features.

## Moments¶

- 1st moment - Arithmetic mean
- 2nd moment - Variance
- 3rd moment - Skewness
- 4th moment - Kurtosis

## Moving average¶

A moving average smooths a sequence of observations.

### Exponential moving average (EMA)¶

A type of moving average in which the influence of past observations on the current average diminishes exponentially with time.

is the moving average at time , is the input at time and is a hyperparameter. As decreases, the moving average weights recent observations more strongly.

#### Bias correction¶

If we initialise the EMA to equal zero () it will be very biased towards zero around the start. To correct this we can start with being close to 0 and gradually increase it. This effect can be achieved by rewriting the formula as:

See Adam: A Method for Stochastic Optimization, Kingma et al. (2015) for an example of this bias correction being used in practice.

## Point estimate¶

An estimate for a parameter, such as the mean of a population for example. It describes the belief about this quantity with a single number, in contrast with a distribution which could be used to describe the belief for the parameter with multiple numbers.