# Statistics¶

## Arithmetic mean¶

The arithmetic mean of a set of inputs is: ## Correlation¶

The correlation between two random variables and is: ## Covariance¶

The covariance between two random variables and is defined as: ### Covariance matrix¶

A square matrix where and and are two variables.

There are three types of covariance matrix:

• Full - All entries are specified. Has parameters for variables.
• Diagonal - The matrix is diagonal, meaning all off-diagonal entries are zero. Variances can differ across dimensions but there is no interplay between the dimensions. Has parameters.
• Spherical - The matrix is equal to the identity matrix multiplied by a constant. This means the variance is the same in all dimensions. Has parameters.

A valid covariance matrix is always symmetric and positive semi-definite.

## Geometric mean¶

The geometric mean of a set of inputs is: Only applicable to positive numbers since otherwise it may involve taking the root of a negative number.

## Harmonic mean¶

The harmonic mean for a set of inputs is: Cannot be computed if one of the numbers is zero since that would necessitate dividing by zero.

Used for the F1-score, which is the Harmonic mean of the precision and recall.

## Heteroscedasticity¶

When the error of a model is correlated with one or more of the features.

## Moving average¶

A moving average smooths a sequence of observations.

### Exponential moving average (EMA)¶

A type of moving average in which the influence of past observations on the current average diminishes exponentially with time.  is the moving average at time , is the input at time and is a hyperparameter. As decreases, the moving average weights recent observations more strongly.

#### Bias correction¶

If we initialise the EMA to equal zero ( ) it will be very biased towards zero around the start. To correct this we can start with being close to 0 and gradually increase it. This effect can be achieved by rewriting the formula as: See Adam: A Method for Stochastic Optimization, Kingma et al. (2015) for an example of this bias correction being used in practice.

## Point estimate¶

An estimate for a parameter, such as the mean of a population for example. It describes the belief about this quantity with a single number, in contrast with a distribution which could be used to describe the belief for the parameter with multiple numbers.

## Skewness¶

Measures the asymmetry of a probability distribution. ## Standard deviation¶

The square root of the variance. The formula is: where is the mean of X.

### Sample standard deviation¶ Note that the above is the biased estimator for the sample standard deviation. Estimators which are unbiased exist but they each only apply to some population distributions.

## Variance¶

The variance of is: where is the mean of X.

The formula can also be written as: ### Sample variance¶

When it is impractical to compute the variance over the entire population, we can take a sample instead and compute the sample variance. The formula for the unbiased sample variance is: 