Shapley Values
Game-theoretic method assigning contribution of individual players in achieving a certain payout.
Definition
The average marginal contribution of a feature value across all possible Feature Subsets.
todo add all math here
Efficiency
- shapley values add up to prediction difference to average prediction
Symmetry
- equal shapley value ←→ equal contributions in all coalitions
Dummy
- feature that doesnt change prediction must have value of zero
Additivity
- linear operation
Pseudocode
featureA = "temp"
X_local = ... # local datapoint
shapley_value = 0.0
for _ in range(N):
feature_subset = random_subset_that_contains(featureA)
for _ in range(1_000):
X = X_local.copy()
X[not feature_subset] = random_feature(not feature_subset)
X_withoutA = X.copy()
X_withoutA[featureA] = random_feature(featureA)
shapley_value += prefactor(feature_subset)*(f(X) – f(X_withoutA))
shapley_value = shapley_value / (N*1_000)
Pros
- Model-Agnostic
- “Fair payout” properties
- solid theory
Cons
- only approximations feasible
- not selective, explanations always contain all features
- access to training data needed
- unrealistic datapoints if data is correlated because of random marginalization of features