Linear Regression

Predicts the target as a weighted sum of the feature inputs.

Assumptions

Linearity (no interactions and no non-linearities)
Normality (target variable follows a Normal Distribution given the features)
Homoscedasticity
- e.g. high house price variance vs. low house price variance
Independence
- if not, use Mixed Effect Model or GEEs
Fixed Features (there is no measurement error or uncertainty in the data)
- otherwise a more complex model has to be used to account for the possible measurement error
Uncorrelated Features (Correlation)
- it becomes hard to know which feature of the correlated ones is having the effect on the target
- use LDA to get less correlated features

Problem to solve

Given some datapoint $x$ with targets $y$ , find weights $β$ that minimize the error $ϵ$ .

y = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} + ϵ \hat{β} = β_{0}, \dots, β_{p} ar g i = 1 min \sum n (y^{(i)} - (β_{0} + j = 1 \sum p β_{j} x_{j}^{(i)}))^{2}

There is a closed form solution (see below) or you can use Gradient Descent to calculate the weights from minimizing the Mean Squared Error.

If our data has Gaussian noise we can even get confidence intervals for beta.

Explanations

Linear Regression has Intrinsic explanations.

We can get Explanations from a Linear Regression model via the estimated weights.

Numerical
- one unit change → outcome changes by its weight
Binary
- changing from 0 to 1 → outcome changes by weight
Categorical
- One-Hot-Encoding → Binary
Intercept
- Predicted value when all values are at their mean (when standardized)
- else when all values are zero

Feature Importance can be calculated with the t-statistic. $t_{\hat{β}_{j}} = \frac{β ^ _{j}}{SE ( β ^ _{j} )}$ So the importance increases when the weight of the feature increases and it decreases when it is uncertain about the correct value.

We can plot the importances with a Weight Plot and the features Effects with an Effect Plot.

Quality of Explanations

Contrastive
- to the zero values instance
- or the mean values instance (when normalized)
Fidelity
- Explanations are truthful if the assumptions are held
- otherwise it might simplify the actual dependencies to simple linear dependencies which of course then are not truthful
Stability
- Stable by design
Not selective by default
- Feature Selection might be necessary
- Feature Engineering might be necessary when there are non-linearities

Evaluation

To evaluate a linear regression model, one can use the R-Squared Metric.

Disadvantages

All nonlinearities have to be hand-crafted and given as input features. Sometimes weights are not interpretable when you have high correlation (e.g. room number and size of house).

Other

What model describes our data the best?

Simple Linear Regression

A calculation with the Linear Gaussian Model yields a solution to the simple linear regression if not all datapoints are the same. $γ_{1} = \frac{c ( x , t )}{V ( t )}$ $γ_{0} = M (x) - γ_{1} M (t)$

or, in terms of the data

$w_{1} = \frac{N ( \sum _{j} x _{j} y _{j} ) - ( \sum _{j} x _{j} ) ( \sum _{j} y _{j} )}{N ( \sum _{j} x _{j}^{2} ) - ( \sum _{j} x _{j} ) ^{2}} w_{0} = \frac{( \sum _{j} y _{j} ) - w _{1} ( \sum _{j} x _{j} )}{N}$

The regression line is defined as $t \mapsto γ_{0} + γ_{1} t$ which goes through the Center of Mass of the data sequence.

Empirical Correlation Coefficient

Steps

Use Cases for Linear Regression

Computer vision and trend analysis.
Dataset is rather small and a Linear Model is enough to approximate the function.
We use regression to predict a real-valued output.

Class Prediction

By applying Logistic Regression we can also do Classification.

Marcs Notes

Explorer

Linear Regression

Linear Regression

Assumptions

Problem to solve

Explanations

Quality of Explanations

Evaluation

Disadvantages

Other

Simple Linear Regression

Steps

Use Cases for Linear Regression

Class Prediction

Graphansicht

Inhaltsverzeichnis

Backlinks