Partial Dependence Plot

Definition

The marginal effect one or two features have on the predicted outcome of a ML model.

A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex. For example, when applied to a linear regression model, partial dependence plots always show a linear relationship.

$S$ is a subset of features (one or two) that are independent with the others. $C$ are all other features that are not in $S$ . We assume that all features in $C$ are uncorrelated with the ones in $S$ .

The partial dependence is $\hat{f}_{S} (x_{S}) = E_{X_{C}} [\hat{f} (x_{S}, X_{C})] = \int \hat{f} (x_{S}, X_{C}) d P (X_{C}) .$ We can estimate this integral using the Monte-Carlo for Integrals method. We then get the simple average over all data points where we fix the values in $S$ to the value from $x$ . $\hat{f}_{S} (x_{S}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{f} (x_{S}, x_{C}^{(i)}) .$

PDP-based Feature Importance

We can use the flatness of the PDP to determine the Feature Importance.

flat → low importance
high Variance → high importance

$I (x_{S}) = \frac{1}{K - 1} \sum_{k = 1}^{K} (\hat{f}_{S} (x_{S}^{(k)}) - \frac{1}{K} \sum_{k = 1}^{K} \hat{f}_{S} (x_{S}^{(k)}))^{2}$

Examples

For more than one feature you can use a Heatmap:

Pros

intuitive

clear interpretation

easy to implement

“causal” interpretation (at least in the model context, not in the real world though)

Cons

maximum of two features

assumption of independence with other features

else there might be data points that are extremely unlikely (see rug plot)

→ use Accumulated Local Effect Plot

heterogeneous effects might be hidden

→ use Individual Conditional Expectation Curve

dont show feature distribution

→ use rug plot (see example)

In sklearn

Implemented + you can use PDPBox

Marcs Notes

Explorer

Partial Dependence Plot

Partial Dependence Plot

PDP-based Feature Importance

Examples

In sklearn

Graphansicht

Inhaltsverzeichnis

Backlinks