Counterfactual

Local Explanation of changing a datapoint in a minimal way such that it achieves a different predefined output. Often a really good Explanation which explains what is most important locally. Often Model-Agnostic.

If X had not occurred, Y would not have occurred

Unlike prototypes , counterfactuals do not have to be actual instances from the training data, but can be a new combination of feature values

  • human-friendly

    • Contrastive → to the current instance via a “what-if” scenario
    • Selectivity → small number of feature changes
  • Rashomon Effect

    • there are multiple different counterfactuals that all change the prediction equally well
  • solve by

    • showing all counterfactuals
    • selecting only the best ones (by some measure)
      • produce the predefined prediction as closely as possible
      • as similar as possible to the original instance regarding feature values
      • change as few features as possible
      • generate multiple diverse explanations
      • likely feature values (see correlations etc.)

How to generate counterfactuals

Minimize the Loss Function

  • original instance
  • current counterfactual
  • predefined output
  • regularisation

Distance between original instance and counterfactual should be minimized Stop when constraint is reached So in total

  1. Select instance , desired outcome , tolerance and a low initial value for
  2. Sample random instance as initial counterfactual
  3. Optimize loss
  4. While constraint is not fullfilled
    1. Increase (more attention on the constraint part)
    2. Optimize loss
    3. Return counterfactual with minimal loss
  5. Repeat 2-4 to generate multiple counterfactuals

Pros

  • clear interpretation
  • two results
    • counterfactual
    • what was changed
  • Model-Agnostic and no access to the data is needed (privacy)
  • works for systems without ML
  • easy to implement

Cons

Software