Friedman’s H-statistic

If there is no interaction we can decompose the PDP into and the prediction function into Here means all features that are not .

The two statistics are (interaction of feature with ) and (interaction of feature with all other features).

The statistic is expensive to evaluate, it takes and calls to the model.

Examples

Pros

  • underlying theory
  • meaningful interpretation
  • dimensionless (comparable across features and models)
  • detects all interactions
  • arbitrarily high interactions

Cons

  • computationally expensive
  • estimates have variance
  • no test for strong enough interaction
  • same as PDPCorrelation can lead to high values