Linear Regression
Predicts the target as a weighted sum of the feature inputs.
Assumptions
- Linearity (no interactions and no non-linearities)
- Normality (target variable follows a Normal Distribution given the features)
- Homoscedasticity
- e.g. high house price variance vs. low house price variance
- Independence
- if not, use Mixed Effect Model or GEEs
- Fixed Features (there is no measurement error or uncertainty in the data)
- otherwise a more complex model has to be used to account for the possible measurement error
- Uncorrelated Features (Correlation)
- it becomes hard to know which feature of the correlated ones is having the effect on the target
- use LDA to get less correlated features
Problem to solve
Given some datapoint with targets , find weights that minimize the error .
There is a closed form solution (see below) or you can use Gradient Descent to calculate the weights from minimizing the Mean Squared Error.
If our data has Gaussian noise we can even get confidence intervals for beta.
Explanations
Linear Regression has Intrinsic explanations.
We can get Explanations from a Linear Regression model via the estimated weights.
- Numerical
- one unit change → outcome changes by its weight
- Binary
- changing from 0 to 1 → outcome changes by weight
- Categorical
- One-Hot-Encoding → Binary
- Intercept
- Predicted value when all values are at their mean (when standardized)
- else when all values are zero
Feature Importance can be calculated with the t-statistic. So the importance increases when the weight of the feature increases and it decreases when it is uncertain about the correct value.
We can plot the importances with a Weight Plot and the features Effects with an Effect Plot.
Quality of Explanations
- Contrastive
- to the zero values instance
- or the mean values instance (when normalized)
- Fidelity
- Explanations are truthful if the assumptions are held
- otherwise it might simplify the actual dependencies to simple linear dependencies which of course then are not truthful
- Stability
- Stable by design
- Not selective by default
- Feature Selection might be necessary
- Feature Engineering might be necessary when there are non-linearities
Evaluation
To evaluate a linear regression model, one can use the R-Squared Metric.
Disadvantages
All nonlinearities have to be hand-crafted and given as input features. Sometimes weights are not interpretable when you have high correlation (e.g. room number and size of house).
Other
What model describes our data the best?
Simple Linear Regression
A calculation with the Linear Gaussian Model yields a solution to the simple linear regression if not all datapoints are the same.
or, in terms of the data
The regression line is defined as which goes through the Center of Mass of the data sequence.
Empirical Correlation Coefficient
Steps
Use Cases for Linear Regression
- Computer vision and trend analysis.
- Dataset is rather small and a Linear Model is enough to approximate the function.
- We use regression to predict a real-valued output.
Class Prediction
By applying Logistic Regression we can also do Classification.