Local Surrogate

Used to explain individual, local predictions of a Black Box Model (thus Model-Agnostic).

Idea

Locally approximate a Black Box Model via a simpler intrinsicly interpretable model. Then generate local explanations from that model.

In contrast to Global Surrogate models local surrogate models don’t try to explain the global prediction behavior of a model but want to explain why a model made a certain prediction.

How to train local surrogate models

Sample pertubated datapoints and predict with Black Box Model
1. by pertubating the datapoint itself
2. by sampling from Normal Distribution with Mean and Variance estimated from the dataset
  1. could also sample from other distributions or just densly sample from feature space with a grid
Weigh each datapoint by its Proximity to the datapoint of interest
1. by adding it multiple times to the dataset
2. by weighing it in a model
Train an Intrinsic interpretable model on this weighted dataset
1. for example LASSO, Ridge Regression, Decision Tree
  1. using models with Regularisation makes a lot of sense to get simpler models which are easier to explain
  2. Make sure that we have high local Fidelity
Explain the datapoint by interpreting the global behavior of this new model

Mathematically

$explanation (x) = ar g min_{g \in G} L (f, g, π_{x}) + Ω (g)$ Out of all explanations (models) $G$ , find the one $g$ that minimizes the resulting explanations complexity (e.g. via LASSO) and the local loss function $L$ around the neighborhood $π_{x}$ where $f$ is the complex Black Box Model.

Visually

A → Black Box Model
B → Sample pertubated datapoints (Normal Distribution)
C → Weigh datapoints based on Proximity to point of interest
D → Train Intrinsic interpretable model
Model must have high Fidelity locally (it should match the black box predictions)
- assess with Accuracy of the interpretable model on the weighted dataset
- global fidelity can be computed when predicting on the original dataset
What is a good Proximity measure and how broad should the neighborhood be?

Pros

Model-Agnostic

human-friendly

selective (short) explanations

possibly Contrastive

**works for all data types **

Fidelity measure

very easy to use

can use interpretable features while the Black Box Model does not

Cons

definition of neighborhood is a big problem

have to try different kernels and look for best explanations

Curse of Dimensionality, proximity measures can become useless in higher dimensions

sampling can lead to unlikely datapoints just like in PDP

complexity of model has to be defined in advance

instability of explanations because of stochastic sampling

difficult to trust explanations

can be manipulated by data scientists

Marcs Notes

Explorer

Local Surrogate

Local Surrogate

How to train local surrogate models

Mathematically

Visually

Graphansicht

Inhaltsverzeichnis

Backlinks