= pd.DataFrame({
Steve "Gender": [1],
"Age": [76],
"Cardiovascular.Diseases": [1],
"Diabetes": [0],
"Neurological.Diseases": [0],
"Kidney.Diseases": [0],
"Cancer": [0]
})
model_rfc_tuned.predict_proba(Steve)# array([[0.65611756, 0.34388244]])
Step 8. Shapley values and the Break-down plots
From the model developer perspective, we are often interested in the global behaviour of a model, i.e. whether it has high performance or how it changes on average as s function of some feature. But the user perspective is different. In most cases, a user is interested in an individual prediction related to him or her. Often we hear about the ,,right to explanation’’, which means that for a model prediction, we should be able to find out which variables significantly influenced the model prediction. Especially for high-stake decisions, we should enrich model predictions with as much information as possible to support informed and responsible predictions.
For tabular data, one of the most commonly used techniques for local variable attribution is Shapley values. The key idea behind this method is to analyze the sequence of conditional expected values. This way, we can trace how the conditional mean moves from the average model response to the model prediction for observation of interest \(x^*\).
Let’s consider a sequence of expected values for model predictions conditioned on expanding sequence of variables.
By looking at consecutive differences \(\mu_{x_1}-\mu\), \(\mu_{x_1,x_2}-\mu_{x_1}\) and so on, one can calculate the added effects of individual variables, see an example in Figure 1. It sounds like a straightforward solution; however, there are two issues with this approach.
One is that it is not easy to estimate the conditional expected value. In most implementations, it is assumed that features are independent, and then we can estimate \(\mu_{K}\) as an average model response with variables in the set \(K\) replaced by corresponding values from observation \(x^*\). So the crude estimate would be
\[ \widehat{\mu}_{K} = \frac 1n \sum_{i=1}^n f(x_1^o, x_2^o, ..., x_p^o),\text{ where }\left\{ {{x_j^o = x_j^*,\text{ if }j \in K} \atop {x_j^o = x_j^i,\text{ if }j \not\in K.} } \right. \]
The second issue is that these effects may depend on the order of conditioning. How to solve this problem? The Shapley values method calculates attributions as an average of all (or at least a large number of random) orderings, while the Break-down method uses a single ordering determined with a greedy heuristic that prefers variables with the largest attribution at the beginning.
Python snippets
Let’s define an observation for which we will examine the model more closely. Let it be a 76-year-old man with hypertension. We show a local model analysis using model_ranger
as an example.
The predict_parts
function for a specified model and a specified observation calculates local variable attributions. The optional argument order
forces use of a specified sequence of variables. If not specified, then a greedy heuristic is used to start conditioning with the most relevant variables. The results are presented in Figure 4.
= explainer_rfc_tuned.predict_parts(Steve,
breakdown_steve type="break_down", random_state=0)
= explainer_rfc_tuned.predict_parts(Steve,
shap_steve type="shap", random_state=0)
=False) breakdown_steve.plot(show
The alternative is to average over all (or at least many random) orderings of variables. This is how the Shapley values are calculated. The show_boxplots
argument highlights the stability of the estimated attributions between different orderings.
=False) shap_steve.plot(show
The Shapley values are additive. For models with interactions, it is often too much of a simplification. The attribution method that can be used to address this problem is Break-down with interactions. Just change the type
argument to break_down_interactions
.
Note that by default, functions such as model_parts
, predict_parts
, model_profiles
do not calculate statistics on the entire data set (this may be time-consuming), but on n_samples
of random cases, and the entire procedure is repeated B
times to estimate the error bars.
R snippets
Let’s create an observation for which we will examine the local behaviour of a model more closely. Let it be a 76-year-old man with hypertension, we will call him Steve
. Below we use model_ranger
to illustrate the local model analysis.
<- data.frame(Gender = factor("Male", c("Female", "Male")),
Steve Age = 76,
Cardiovascular.Diseases = factor("Yes", c("No", "Yes")),
Diabetes = factor("No", c("No", "Yes")),
Neurological.Diseases = factor("No", c("No", "Yes")),
Kidney.Diseases = factor("No", c("No", "Yes")),
Cancer = factor("No", c("No", "Yes")))
predict(model_ranger, Steve)
# 0.322
The predict_parts
function for a specified model and a specified observation calculates local variable attributions. The optional argument order
forces use of a specified sequence of variables. If not specified, then a greedy heuristic is used to start conditioning with the most relevant variables.
<- predict_parts(model_ranger, Steve)
bd_ranger
bd_ranger# contribution
# Ranger: intercept 0.043
# Ranger: Age = 76 0.181
# Ranger: Cardiovascular.Diseases = Yes 0.069
# Ranger: Gender = Male 0.033
# Ranger: Kidney.Diseases = No -0.004
# Ranger: Cancer = No -0.002
# Ranger: Diabetes = No 0.003
# Ranger: Neurological.Diseases = No 0.000
# Ranger: prediction 0.322
plot(bd_ranger)
The alternative is to average over all (or at least many random) orderings of variables. This is how the Shapley values are calculated. The show_boxplots
argument highlights the stability of the estimated attributions between different orderings.
<- predict_parts(model_ranger, Steve, type = "shap")
shap_ranger plot(shap_ranger, show_boxplots = TRUE)
The Shapley values are additive. For models with interactions, it is often too much of a simplification. An alternative variable attribution method that can be used to identify pairwise interactions is Break-down with interactions. Just change type
argument to break_down_interactions
value. See Chapter 7 in EMA book for more details.
Note that by default, functions such as model_parts
, predict_parts
, model_profiles
do not calculate statistics on the entire data set (this may be time-consuming), but on n_samples
of random cases, and the entire procedure is repeated B
times to estimate the error bars.