Friedman’s H-statistic

If there is no interaction we can decompose the PDP into $P D_{jk} (x_{j}, x_{k}) = P D_{j} (x_{j}) + P D_{k} (x_{k})$ and the prediction function into $\hat{f} (x) = P D_{j} (x_{j}) + P D_{- j} (x_{- j}) .$ Here $- j$ means all features that are not $j$ .

The two statistics are $H_{jk}^{2} = \frac{\sum _{i = 1}^{n} [ P D _{jk} ( x _{j}^{(i)} , x _{k}^{(i)} ) - P D _{j} ( x _{j}^{(i)} ) - P D _{k} ( x _{k}^{(i)} ) ] ^{2}}{\sum _{i = 1}^{n} P D _{jk}^{2} ( x _{j}^{(i)} , x _{k}^{(i)} )}$ (interaction of feature $j$ with $k$ ) and $H_{j}^{2} = \frac{\sum _{i = 1}^{n} [ f ^ ( x ^{(i)} ) - P D _{j} ( x _{j}^{(i)} ) - P D _{- j} ( x _{- j}^{(i)} ) ] ^{2}}{\sum _{i = 1}^{n} f ^ ^{2} ( x ^{(i)} )}$ (interaction of feature $j$ with all other features).

The statistic is expensive to evaluate, it takes $2 n^{2}$ and $3 n^{2}$ calls to the model.

Examples

Pros

underlying theory

meaningful interpretation

dimensionless (comparable across features and models)

detects all interactions

arbitrarily high interactions

Cons

computationally expensive

estimates have variance

no test for strong enough interaction

same as PDP → Correlation can lead to high values

Marcs Notes

Explorer

Friedman's H-statistic

Friedman’s H-statistic

Examples

Graphansicht

Inhaltsverzeichnis

Backlinks