Regression¶
Confidence intervals¶
The confidence interval for a point estimate measures is the interval within which we have a particular degree of confidence the true value resides. For example, the 95% confidence interval for the mean height in a population may be [1.78m, 1.85m].
Confidence intervals can be calculated in this way:
- Let be the specified confidence level. eg for the 95% confidence level.
- Let be the pdf for Student’s t distribution, parameterised by the number of degrees of freedom which is the sample size (n) minus 1.
- Calculate
- Then the confidence interval for the point estimate is:
Where is the estimated value of the statistic, is the true value and is the sample standard deviation.
Isotonic regression¶
Fits a step-wise monotonic function to the data. A useful way to avoid overfitting if there is a strong theoretical reason to believe that the function is monotonic. For example, the relationship between the floor area of houses and their prices.
Linear regression¶
The simplest form of regression. Estimates a model with the equation:
where the are parameters to be estimated by the model and the are the features.
The loss function is usually the squared error.
Normal equation¶
The equation that gives the optimal parameters for a linear regression.
Rewrite the regression equation as .
Then the formula for which minimizes the squared error is:
Logistic regression¶
Used for modelling probabilities. It uses the sigmoid function () to ensure the predicted values are between 0 and 1. Values outside of this range would not make sense when predicting a probability. The functional form is:
Multicollinearity¶
When one of the features is a linear function of one or more of the others.
P-values¶
Measure the statistical significance of the coefficients of a regression. The closer the p-value is to 0, the more statistically significant that result is.
The p-value is the probability of seeing an effect greater than or equal to the one observed if there is in fact no relationship.
In a regression the formula for calculating the p-value of a coefficient is:
TODO