Linear regression
Gist
The most basic technique and the most powerful. Learn it once, and keep learning because you learn something new each time.
Mathematics
Standard format
For the simplest linear regression:
where
Matrix notation
We can also present the regression in the matrix form:
Assumptions
-
Linearity in terms of the coefficients: so the addition of Polynomial (linear regression) such as
is valid for a linear regression. -
The Error term at each value of the predictor is normally distributed- though this assumption can be broken if model is used more for prediction than inference (see: )
- Many people erroneously assume that the response/predictor values must be normal though simulation studies showed that ultimately it doesn't matter if you did it:
"Although this study strongly recommends the appropriate use of normality tests in linear modelling—which is to evaluate the residuals and not the raw data for normality—our simulations also show that if a normality test is applied to raw data, the subsequent choice of a parametric or non-parametric test has little difference in power." (Midway and White, 2025)
- Many people erroneously assume that the response/predictor values must be normal though simulation studies showed that ultimately it doesn't matter if you did it:
-
Variance is constant, if not than we consider the model to have Heterodeskity
-
Errors are independent (i.e there's no Autocorrelation)
Parameterization
We use Least squares estimates where you can derive it from:
or
the matrix form
Interpretation
I realized that even the r
- With Interactions
Note that it is not necessary to “hold constant” all other variables to be able to interpret the effect of one predictor. It is sufficient to hold constant the weighted sum of all the variables other than
. And in many cases it is not physically possible to hold other variables constant while varying one, e.g., when a model contains X and X2
Coding
R
lm()
Python
Resources
- Midway, S., & White, J. W. (2025). Testing for normality in regression models: mistakes abound (but may not matter). Royal Society Open Science, 12(4), 241904.