RuleFit

A combination of Linear Regression and Decision Tree. This will let Linear Regression also use interactions between features which it can’t on its own.

Algorithm

Fit many random depth decision trees and extract rules (paths)
Add extracted rules as new (binary) features that can either be fullfilled (1) or not (0)
1. Clip original features on corresponding Quantile
2. Standardize clipped features to be on same scale as the binary rule features
Fit LASSO on new feature matrix

Explainability

As the output of RuleFit is a Linear Regression model, we can just use its interpretation of the weights of the original features.

For the new features we can calculate the importance using

$I_{k} = ∣ \overset{α}{^}_{k} ∣ \cdot s_{k} (1 - s_{k})$ where $s_{k} = \frac{1}{n} \sum_{i = 1}^{n} r_{k} (x^{(i)})$ is the rule support for rule $r_{k}$ and $\overset{α}{^}_{k}$ is the Linear Regression weight.

So to get the Feature Importance for exactly one datapoint (Local Explanation) we calculate $J_{j} (x) = I_{j} (x) + \sum_{x_{j} \in r_{k}} I_{k} (x) / m_{k} .$ To get a Global Explanation for a feature we can sum over all datapoints.

$J_{j} (X) = \sum_{i = 1}^{n} J_{j} (x^{(i)}) .$

Resources

There are many other methods in the Python package imodels.

Marcs Notes

Explorer

RuleFit

RuleFit

Algorithm

Explainability

Resources

Graphansicht

Inhaltsverzeichnis

Backlinks