Counterfactual
Local Explanation of changing a datapoint in a minimal way such that it achieves a different predefined output. Often a really good Explanation which explains what is most important locally. Often Model-Agnostic.
If X had not occurred, Y would not have occurred
Unlike prototypes , counterfactuals do not have to be actual instances from the training data, but can be a new combination of feature values
-
human-friendly
- Contrastive → to the current instance via a “what-if” scenario
- Selectivity → small number of feature changes
-
- there are multiple different counterfactuals that all change the prediction equally well
-
solve by
- showing all counterfactuals
- selecting only the best ones (by some measure)
- produce the predefined prediction as closely as possible
- as similar as possible to the original instance regarding feature values
- change as few features as possible
- generate multiple diverse explanations
- likely feature values (see correlations etc.)
How to generate counterfactuals
Minimize the Loss Function
- original instance
- current counterfactual
- predefined output
- regularisation
Distance between original instance and counterfactual should be minimized Stop when constraint is reached So in total
- Select instance , desired outcome , tolerance and a low initial value for
- Sample random instance as initial counterfactual
- Optimize loss
- While constraint is not fullfilled
- Increase (more attention on the constraint part)
- Optimize loss
- Return counterfactual with minimal loss
- Repeat 2-4 to generate multiple counterfactuals
Pros
- clear interpretation
- two results
- counterfactual
- what was changed
- Model-Agnostic and no access to the data is needed (privacy)
- works for systems without ML
- easy to implement
Cons
- Instability via the Rashomon Effect