# Regression¶

## Confidence intervals¶

The confidence interval for a point estimate measures is the interval within which we have a particular degree of confidence the true value resides. For example, the 95% confidence interval for the mean height in a population may be [1.78m, 1.85m].

Confidence intervals can be calculated in this way:

1. Let be the specified confidence level. eg for the 95% confidence level.
2. Let be the pdf for Student’s t distribution, parameterised by the number of degrees of freedom which is the sample size (n) minus 1.
3. Calculate
4. Then the confidence interval for the point estimate is:

Where is the estimated value of the statistic, is the true value and is the sample standard deviation.

## Isotonic regression¶

Fits a step-wise monotonic function to the data. A useful way to avoid overfitting if there is a strong theoretical reason to believe that the function is monotonic. For example, the relationship between the floor area of houses and their prices.

## Linear regression¶

The simplest form of regression. Estimates a model with the equation:

where the are parameters to be estimated by the model and the are the features.

The loss function is usually the squared error.

### Normal equation¶

The equation that gives the optimal parameters for a linear regression.

Rewrite the regression equation as .

Then the formula for which minimizes the squared error is:

## Logistic regression¶

Used for modelling probabilities. It uses the sigmoid function () to ensure the predicted values are between 0 and 1. Values outside of this range would not make sense when predicting a probability. The functional form is:

## Multicollinearity¶

When one of the features is a linear function of one or more of the others.

## P-values¶

Measure the statistical significance of the coefficients of a regression. The closer the p-value is to 0, the more statistically significant that result is.

The p-value is the probability of seeing an effect greater than or equal to the one observed if there is in fact no relationship.

In a regression the formula for calculating the p-value of a coefficient is:

TODO