Linear regression
Gist
A model to describe the linear relationship between an independent variable and a predictor.
Mathematics
Standard
For the simplest linear regression:
where
Matrix notation
We can also present the regression in the matrix form:
Key Assumptions
-
Linearity in terms of the coefficients: so the addition of Polynomial (linear regression) terms such as
is valid for a linear regression. -
The Error term at each value of the predictor is normally distributed: though this assumption can be broken if model is used more for prediction than inference:[1]
"Although this study strongly recommends the appropriate use of normality tests in linear modelling—which is to evaluate the residuals and not the raw data for normality—our simulations also show that if a normality test is applied to raw data, the subsequent choice of a parametric or non-parametric test has little difference in power." [2]
-
The variance of the error is constant, if not than we consider the model to have Heterodeskity. Though Gelman suggests it's not that big of a deal and one can also use weighted regression [3]
-
Errors are independent (i.e there's no Autocorrelation)
Parameterization
We can use the Least squares using the calculus method[4] or the linear algebra method (see the page for derivation)
Interpretation
Diagnostics
Check the Residuals
###References
1,2 ↩︎
1Midway, S., & White, J. W. (2025). Testing for normality in regression models: mistakes abound (but may not matter). Royal Society Open Science, 12(4), 241904. ↩︎
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press. ↩︎
3 ↩︎