Title: | Effects and Importances of Model Ingredients |
Version: | 2.3.0 |
Description: | Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <doi:10.48550/arXiv.1806.08915>. |
Depends: | R (≥ 3.5) |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Imports: | ggplot2, scales, gridExtra, methods |
Suggests: | DALEX, gower, ranger, testthat, r2d3, jsonlite, knitr, rmarkdown, covr |
URL: | https://ModelOriented.github.io/ingredients/, https://github.com/ModelOriented/ingredients |
BugReports: | https://github.com/ModelOriented/ingredients/issues |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2023-01-14 23:08:04 UTC; pbiecek |
Author: | Przemyslaw Biecek |
Maintainer: | Przemyslaw Biecek <przemyslaw.biecek@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-01-15 11:40:02 UTC |
Accumulated Local Effects Profiles aka ALEPlots
Description
Accumulated Local Effects Profiles accumulate local changes in Ceteris Paribus Profiles.
Function accumulated_dependence
calls ceteris_paribus
and then aggregate_profiles
.
Usage
accumulated_dependence(x, ...)
## S3 method for class 'explainer'
accumulated_dependence(
x,
variables = NULL,
N = 500,
variable_splits = NULL,
grid_points = 101,
...,
variable_type = "numerical"
)
## Default S3 method:
accumulated_dependence(
x,
data,
predict_function = predict,
label = class(x)[1],
variables = NULL,
N = 500,
variable_splits = NULL,
grid_points = 101,
...,
variable_type = "numerical"
)
## S3 method for class 'ceteris_paribus_explainer'
accumulated_dependence(x, ..., variables = NULL)
accumulated_dependency(x, ...)
Arguments
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles.
By default, |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset Will be extracted from |
predict_function |
predict function Will be extracted from |
label |
name of the model. By default it's extracted from the |
Details
Find more detailes in the Accumulated Local Dependence Chapter.
Value
an object of the class aggregated_profiles_explainer
References
ALEPlot: Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots https://cran.r-project.org/package=ALEPlot, Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
adp_glm <- accumulated_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare"))
head(adp_glm)
plot(adp_glm)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "numerical")
plot(adp_rf)
adp_rf <- accumulated_dependence(explain_titanic_rf, N = 200, variable_type = "categorical")
plotD3(adp_rf, label_margin = 80, scale_plot = TRUE)
Aggregates Ceteris Paribus Profiles
Description
The function aggregate_profiles()
calculates an aggregate of ceteris paribus profiles.
It can be: Partial Dependence Profile (average across Ceteris Paribus Profiles),
Conditional Dependence Profile (local weighted average across Ceteris Paribus Profiles) or
Accumulated Local Dependence Profile (cummulated average local changes in Ceteris Paribus Profiles).
Usage
aggregate_profiles(
x,
...,
variable_type = "numerical",
groups = NULL,
type = "partial",
variables = NULL,
span = 0.25,
center = FALSE
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be calculated together |
variable_type |
a character. If |
groups |
a variable name that will be used for grouping.
By default |
type |
either |
variables |
if not |
span |
smoothing coefficient, by default |
center |
by default accumulated profiles start at 0. If |
Value
an object of the class aggregated_profiles_explainer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
head(titanic_imputed)
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
head(cp_rf)
# continuous variable
pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial")
pdp_rf_p$`_label_` <- "RF_partial"
pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional")
pdp_rf_c$`_label_` <- "RF_conditional"
pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated")
pdp_rf_a$`_label_` <- "RF_accumulated"
plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_")
pdp_rf <- aggregate_profiles(cp_rf, variables = "age",
groups = "gender")
head(pdp_rf)
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red") +
show_aggregated_profiles(pdp_rf, size = 3, color = "_label_")
# categorical variable
pdp_rf_p <- aggregate_profiles(cp_rf, variables = "class",
variable_type = "categorical", type = "partial")
pdp_rf_p$`_label_` <- "RF_partial"
pdp_rf_c <- aggregate_profiles(cp_rf, variables = "class",
variable_type = "categorical", type = "conditional")
pdp_rf_c$`_label_` <- "RF_conditional"
pdp_rf_a <- aggregate_profiles(cp_rf, variables = "class",
variable_type = "categorical", type = "accumulated")
pdp_rf_a$`_label_` <- "RF_accumulated"
plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_")
# or maybe flipped?
library(ggplot2)
plot(pdp_rf_p, pdp_rf_c, pdp_rf_a, color = "_label_") + coord_flip()
pdp_rf <- aggregate_profiles(cp_rf, variables = "class", variable_type = "categorical",
groups = "gender")
head(pdp_rf)
plot(pdp_rf, variables = "class")
# or maybe flipped?
plot(pdp_rf, variables = "class") + coord_flip()
Bind Multiple ggplot Objects
Description
This is an aesthetically efficient implementation of the
grid.arrange
Usage
bind_plots(..., byrow = FALSE)
Arguments
... |
( |
byrow |
( |
Value
(gtable
) A plottable object with plot()
.
Author(s)
Examples
library("DALEX")
library("ingredients")
titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_glm <- explain(titanic_glm,
data = titanic_imputed,
y = titanic_imputed$survived,
verbose = FALSE)
pdp_numerical <- partial_dependence(explain_glm, N = 50, variable_type = "numerical")
pdp_categorical <- partial_dependence(explain_glm, N = 50, variable_type = "categorical")
# Bind plots by rows
bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = TRUE)
# Bind plots by columns
bind_plots(plot(pdp_numerical), plot(pdp_categorical), byrow = FALSE)
Calculate Oscillations for Ceteris Paribus Explainer
Description
Oscillations are proxies for local feature importance at the instance level. Find more details in Ceteris Paribus Oscillations Chapter.
Usage
calculate_oscillations(x, sort = TRUE, ...)
Arguments
x |
a ceteris paribus explainer produced with the |
sort |
a logical value. If |
... |
other arguments |
Value
an object of the class ceteris_paribus_oscillations
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
# build a model
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_small, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_small[,-8],
y = titanic_small[,8])
cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,])
calculate_oscillations(cp_rf)
library("ranger")
apartments_rf_model <- ranger(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = apartments)
explainer_rf <- explain(apartments_rf_model,
data = apartments_test[,-1],
y = apartments_test$m2.price,
label = "ranger forest",
verbose = FALSE)
apartment <- apartments_test[1,]
cp_rf <- ceteris_paribus(explainer_rf, apartment)
calculate_oscillations(cp_rf)
Internal Function for Individual Variable Profiles
Description
This function calculates individual variable profiles (ceteris paribus profiles), i.e. series of predictions from a model calculated for observations with altered single coordinate.
Usage
calculate_variable_profile(
data,
variable_splits,
model,
predict_function = predict,
...
)
## Default S3 method:
calculate_variable_profile(
data,
variable_splits,
model,
predict_function = predict,
...
)
Arguments
data |
set of observations. Profile will be calculated for every observation (every row) |
variable_splits |
named list of vectors. Elements of the list are vectors with points in which profiles should be calculated. See an example for more details. |
model |
a model that will be passed to the |
predict_function |
function that takes data and model and returns numeric predictions. Note that the ... arguments will be passed to this function. |
... |
other parameters that will be passed to the |
Details
Note that calculate_variable_profile
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
Value
a data frame with profiles for selected variables and selected observations
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Internal Function for Split Points for Selected Variables
Description
This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.
Usage
calculate_variable_split(
data,
variables = colnames(data),
grid_points = 101,
variable_splits_type = "quantiles",
new_observation = NA
)
## Default S3 method:
calculate_variable_split(
data,
variables = colnames(data),
grid_points = 101,
variable_splits_type = "quantiles",
new_observation = NA
)
Arguments
data |
validation dataset. Is used to determine distribution of observations. |
variables |
names of variables for which splits shall be calculated |
grid_points |
number of points used for response path |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
new_observation |
if specified (not |
Details
Note that calculate_variable_split
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
Value
A named list with splits for selected variables
Ceteris Paribus Profiles aka Individual Variable Profiles
Description
This explainer works for individual observations. For each observation it calculates Ceteris Paribus Profiles for selected variables. Such profiles can be used to hypothesize about model results if selected variable is changed. For this reason it is also called 'What-If Profiles'.
Usage
ceteris_paribus(x, ...)
## S3 method for class 'explainer'
ceteris_paribus(
x,
new_observation,
y = NULL,
variables = NULL,
variable_splits = NULL,
grid_points = 101,
variable_splits_type = "quantiles",
...
)
## Default S3 method:
ceteris_paribus(
x,
data,
predict_function = predict,
new_observation,
y = NULL,
variables = NULL,
variable_splits = NULL,
grid_points = 101,
variable_splits_type = "quantiles",
variable_splits_with_obs = FALSE,
label = class(x)[1],
...
)
Arguments
x |
an explainer created with the |
... |
other parameters |
new_observation |
a new observation with columns that corresponds to variables used in the model |
y |
true labels for |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
maximum number of points for profile calculations. Note that the finaln number of points may be lower than |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
data |
validation dataset. It will be extracted from |
predict_function |
predict function. It will be extracted from |
variable_splits_with_obs |
if |
label |
name of the model. By default it's extracted from the |
Details
Find more details in Ceteris Paribus Chapter.
Value
an object of the class ceteris_paribus_explainer
.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
# build a model
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_small,
family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_small[,-8],
y = titanic_small[,8])
cp_rf <- ceteris_paribus(explain_titanic_glm, titanic_small[1,])
cp_rf
plot(cp_rf, variables = "age")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
# select few passangers
selected_passangers <- select_sample(titanic_imputed, n = 20)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red")
Ceteris Paribus 2D Plot
Description
This function calculates ceteris paribus profiles for grid of values spanned by two variables. It may be useful to identify or present interactions between two variables.
Usage
ceteris_paribus_2d(explainer, observation, grid_points = 101, variables = NULL)
Arguments
explainer |
a model to be explained, preprocessed by the |
observation |
a new observation for which predictions need to be explained |
grid_points |
number of points used for response path. Will be used for both variables |
variables |
if specified, then only these variables will be explained |
Value
an object of the class ceteris_paribus_2d_explainer
.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
cp_rf <- ceteris_paribus_2d(explain_titanic_glm, titanic_imputed[1,],
variables = c("age", "fare", "sibsp"))
head(cp_rf)
plot(cp_rf)
library("ranger")
set.seed(59)
apartments_rf_model <- ranger(m2.price ~., data = apartments)
explainer_rf <- explain(apartments_rf_model,
data = apartments_test[,-1],
y = apartments_test[,1],
label = "ranger forest",
verbose = FALSE)
new_apartment <- apartments_test[1,]
new_apartment
wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment,
variables = c("surface", "floor", "no.rooms"))
head(wi_rf_2d)
plot(wi_rf_2d)
Cluster Ceteris Paribus Profiles
Description
This function calculates aggregates of ceteris paribus profiles based on hierarchical clustering.
Usage
cluster_profiles(
x,
...,
aggregate_function = mean,
variable_type = "numerical",
center = FALSE,
k = 3,
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
aggregate_function |
a function for profile aggregation. By default it's |
variable_type |
a character. If |
center |
shall profiles be centered before clustering |
k |
number of clusters for the hclust function |
variables |
if not |
Details
Find more detailes in the Clustering Profiles Chapter.
Value
an object of the class aggregated_profiles_explainer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
selected_passangers <- select_sample(titanic_imputed, n = 100)
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers)
clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age")
plot(clust_rf)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
pdp_rf <- aggregate_profiles(cp_rf, variables = "age")
head(pdp_rf)
clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age")
head(clust_rf)
plot(clust_rf, color = "_label_") +
show_aggregated_profiles(pdp_rf, color = "black", size = 3)
plot(cp_rf, color = "grey", variables = "age") +
show_aggregated_profiles(clust_rf, color = "_label_", size = 2)
clust_rf <- cluster_profiles(cp_rf, k = 3, center = TRUE, variables = "age")
head(clust_rf)
Conditional Dependence Profiles
Description
Conditional Dependence Profiles (aka Local Profiles) average localy Ceteris Paribus Profiles. Function 'conditional_dependence' calls 'ceteris_paribus' and then 'aggregate_profiles'.
Usage
conditional_dependence(x, ...)
## S3 method for class 'explainer'
conditional_dependence(
x,
variables = NULL,
N = 500,
variable_splits = NULL,
grid_points = 101,
...,
variable_type = "numerical"
)
## Default S3 method:
conditional_dependence(
x,
data,
predict_function = predict,
label = class(x)[1],
variables = NULL,
N = 500,
variable_splits = NULL,
grid_points = 101,
...,
variable_type = "numerical"
)
## S3 method for class 'ceteris_paribus_explainer'
conditional_dependence(x, ..., variables = NULL)
local_dependency(x, ...)
conditional_dependency(x, ...)
Arguments
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles. By default |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset, will be extracted from |
predict_function |
predict function, will be extracted from |
label |
name of the model. By default it's extracted from the |
Details
Find more details in the Accumulated Local Dependence Chapter.
Value
an object of the class aggregated_profile_explainer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
cdp_glm <- conditional_dependence(explain_titanic_glm,
N = 150, variables = c("age", "fare"))
head(cdp_glm)
plot(cdp_glm)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "numerical")
plot(cdp_rf)
cdp_rf <- conditional_dependence(explain_titanic_rf, N = 200, variable_type = "categorical")
plotD3(cdp_rf, label_margin = 100, scale_plot = TRUE)
Natural language description of feature importance explainer
Description
Generic function describe
generates a natural language
description of ceteris_paribus()
, aggregated_profiles()
and
feature_importance()
explanations what enchaces their interpretability.
Usage
## S3 method for class 'partial_dependence_explainer'
describe(
x,
nonsignificance_treshold = 0.15,
...,
display_values = FALSE,
display_numbers = FALSE,
variables = NULL,
label = "prediction"
)
describe(x, ...)
## S3 method for class 'ceteris_paribus_explainer'
describe(
x,
nonsignificance_treshold = 0.15,
...,
display_values = FALSE,
display_numbers = FALSE,
variables = NULL,
label = "prediction"
)
## S3 method for class 'feature_importance_explainer'
describe(x, nonsignificance_treshold = 0.15, ...)
Arguments
x |
a ceteris paribus explanation produced with function |
nonsignificance_treshold |
a parameter specifying a treshold for variable importance |
... |
other arguments |
display_values |
allows for displaying variable values |
display_numbers |
allows for displaying numerical values |
variables |
a character of a single variable name to be described |
label |
label for model's prediction |
Details
Function describe.ceteris_paribus()
generates a natural language description of
ceteris paribus profile. The description summarizes variable values, that would change
model's prediction at most. If a ceteris paribus profile for multiple variables is passed,
variables
must specify a single variable to be described. Works only for a ceteris paribus profile
for one observation. In current version only categorical values are discribed. For display_numbers = TRUE
three most important variable values are displayed, while display_numbers = FALSE
displays
all the important variables, however without further details.
Function describe.ceteris_paribus()
generates a natural language description of
ceteris paribus profile. The description summarizes variable values, that would change
model's prediction at most. If a ceteris paribus profile for multiple variables is passed,
variables
must specify a single variable to be described. Works only for a ceteris paribus profile
for one observation. For display_numbers = TRUE
three most important variable values are displayed, while display_numbers = FALSE
displays
all the important variables, however without further details.
Function describe.feature_importance_explainer()
generates a natural language
description of feature importance explanation. It prints the number of important variables, that
have significant dropout difference from the full model, depending on nonsignificance_treshold
.
The description prints the three most important variables for the model's prediction.
The current design of DALEX explainer does not allow for displaying variables values.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 10)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical")
describe(pdp, variables = "gender")
library("DALEX")
library("ingredients")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passanger <- select_sample(titanic_imputed, n = 1, seed = 123)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger)
plot(cp_rf, variable_type = "categorical")
describe(cp_rf, variables = "class", label = "the predicted probability")
library("DALEX")
library("ingredients")
lm_model <- lm(m2.price~., data = apartments)
explainer_lm <- explain(lm_model, data = apartments[,-1], y = apartments[,1])
fi_lm <- feature_importance(explainer_lm, loss_function = DALEX::loss_root_mean_square)
plot(fi_lm)
describe(fi_lm)
Feature Importance
Description
This function calculates permutation based feature importance. For this reason it is also called the Variable Dropout Plot.
Usage
feature_importance(x, ...)
## S3 method for class 'explainer'
feature_importance(
x,
loss_function = DALEX::loss_root_mean_square,
...,
type = c("raw", "ratio", "difference"),
n_sample = NULL,
B = 10,
variables = NULL,
variable_groups = NULL,
N = n_sample,
label = NULL
)
## Default S3 method:
feature_importance(
x,
data,
y,
predict_function = predict,
loss_function = DALEX::loss_root_mean_square,
...,
label = class(x)[1],
type = c("raw", "ratio", "difference"),
n_sample = NULL,
B = 10,
variables = NULL,
N = n_sample,
variable_groups = NULL
)
Arguments
x |
an explainer created with function |
... |
other parameters |
loss_function |
a function thet will be used to assess variable importance |
type |
character, type of transformation that should be applied for dropout loss.
"raw" results raw drop losses, "ratio" returns |
n_sample |
alias for |
B |
integer, number of permutation rounds to perform on each variable. By default it's |
variables |
vector of variables. If |
variable_groups |
list of variables names vectors. This is for testing joint variable importance.
If |
N |
number of observations that should be sampled for calculation of variable importance.
If |
label |
name of the model. By default it's extracted from the |
data |
validation dataset, will be extracted from |
y |
true labels for |
predict_function |
predict function, will be extracted from |
Details
Find more details in the Feature Importance Chapter.
Value
an object of the class feature_importance
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
fi_glm <- feature_importance(explain_titanic_glm, B = 1)
plot(fi_glm)
fi_glm_joint1 <- feature_importance(explain_titanic_glm,
variable_groups = list("demographics" = c("gender", "age"),
"ticket_type" = c("fare")),
label = "lm 2 groups")
plot(fi_glm_joint1)
fi_glm_joint2 <- feature_importance(explain_titanic_glm,
variable_groups = list("demographics" = c("gender", "age"),
"wealth" = c("fare", "class"),
"family" = c("sibsp", "parch"),
"embarked" = "embarked"),
label = "lm 5 groups")
plot(fi_glm_joint2, fi_glm_joint1)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
fi_rf <- feature_importance(explain_titanic_rf)
plot(fi_rf)
fi_rf <- feature_importance(explain_titanic_rf, B = 6) # 6 replications
plot(fi_rf)
fi_rf_group <- feature_importance(explain_titanic_rf,
variable_groups = list("demographics" = c("gender", "age"),
"wealth" = c("fare", "class"),
"family" = c("sibsp", "parch"),
"embarked" = "embarked"),
label = "rf 4 groups")
plot(fi_rf_group, fi_rf)
HR_rf_model <- ranger(status ~., data = HR, probability = TRUE)
explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status,
model_info = list(type = 'multiclass'))
fi_rf <- feature_importance(explainer_rf, type = "raw",
loss_function = DALEX::loss_cross_entropy)
head(fi_rf)
plot(fi_rf)
HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial")
explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired"))
fi_glm <- feature_importance(explainer_glm, type = "raw",
loss_function = DALEX::loss_root_mean_square)
head(fi_glm)
plot(fi_glm)
Partial Dependence Profiles
Description
Partial Dependence Profiles are averages from Ceteris Paribus Profiles.
Function partial_dependence
calls ceteris_paribus
and then aggregate_profiles
.
Usage
partial_dependence(x, ...)
## S3 method for class 'explainer'
partial_dependence(
x,
variables = NULL,
N = 500,
variable_splits = NULL,
grid_points = 101,
...,
variable_type = "numerical"
)
## Default S3 method:
partial_dependence(
x,
data,
predict_function = predict,
label = class(x)[1],
variables = NULL,
grid_points = 101,
variable_splits = NULL,
N = 500,
...,
variable_type = "numerical"
)
## S3 method for class 'ceteris_paribus_explainer'
partial_dependence(x, ..., variables = NULL)
partial_dependency(x, ...)
Arguments
x |
an explainer created with function |
... |
other parameters |
variables |
names of variables for which profiles shall be calculated.
Will be passed to |
N |
number of observations used for calculation of partial dependence profiles. By default |
variable_splits |
named list of splits for variables, in most cases created with |
grid_points |
number of points for profile. Will be passed to |
variable_type |
a character. If |
data |
validation dataset, will be extracted from |
predict_function |
predict function, will be extracted from |
label |
name of the model. By default it's extracted from the |
Details
Find more details in the Partial Dependence Profiles Chapter.
Value
an object of the class aggregated_profiles_explainer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_glm <- partial_dependence(explain_titanic_glm,
N = 25, variables = c("age", "fare"))
head(pdp_glm)
plot(pdp_glm)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "numerical")
plot(pdp_rf)
pdp_rf <- partial_dependence(explain_titanic_rf, variable_type = "categorical")
plotD3(pdp_rf, label_margin = 80, scale_plot = TRUE)
Plots Aggregated Profiles
Description
Function plot.aggregated_profiles_explainer
plots partial dependence plot or accumulated effect plot.
It works in a similar way to plot.ceteris_paribus
, but instead of individual profiles
show average profiles for each variable listed in the variables
vector.
Usage
## S3 method for class 'aggregated_profiles_explainer'
plot(
x,
...,
size = 1,
alpha = 1,
color = "_label_",
facet_ncol = NULL,
facet_scales = "free_x",
variables = NULL,
title = NULL,
subtitle = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color, or hex code for a color, or |
facet_ncol |
number of columns for the |
facet_scales |
a character value for the |
variables |
if not |
title |
a character. Partial and accumulated dependence explainers have deafult value. |
subtitle |
a character. If |
Value
a ggplot2
object
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
pdp_rf_p <- partial_dependence(explain_titanic_glm, N = 50)
pdp_rf_p$`_label_` <- "RF_partial"
pdp_rf_l <- conditional_dependence(explain_titanic_glm, N = 50)
pdp_rf_l$`_label_` <- "RF_local"
pdp_rf_a<- accumulated_dependence(explain_titanic_glm, N = 50)
pdp_rf_a$`_label_` <- "RF_accumulated"
head(pdp_rf_p)
plot(pdp_rf_p, pdp_rf_l, pdp_rf_a, color = "_label_")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
pdp_rf_p <- aggregate_profiles(cp_rf, variables = "age", type = "partial")
pdp_rf_p$`_label_` <- "RF_partial"
pdp_rf_c <- aggregate_profiles(cp_rf, variables = "age", type = "conditional")
pdp_rf_c$`_label_` <- "RF_conditional"
pdp_rf_a <- aggregate_profiles(cp_rf, variables = "age", type = "accumulated")
pdp_rf_a$`_label_` <- "RF_accumulated"
head(pdp_rf_p)
plot(pdp_rf_p)
plot(pdp_rf_p, pdp_rf_c, pdp_rf_a)
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red") +
show_aggregated_profiles(pdp_rf_p, size = 2)
Plot Ceteris Paribus 2D Explanations
Description
This function plots What-If Plots for a single prediction / observation.
Usage
## S3 method for class 'ceteris_paribus_2d_explainer'
plot(
x,
...,
facet_ncol = NULL,
add_raster = TRUE,
add_contour = TRUE,
bins = 3,
add_observation = TRUE,
pch = "+",
size = 6
)
Arguments
x |
a ceteris paribus explainer produced with the |
... |
currently will be ignored |
facet_ncol |
number of columns for the |
add_raster |
if |
add_contour |
if |
bins |
number of contours to be added |
add_observation |
if |
pch |
character, symbol used to plot observations |
size |
numeric, size of individual datapoints |
Value
a ggplot2
object
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
apartments_rf_model <- ranger(m2.price ~., data = apartments)
explainer_rf <- explain(apartments_rf_model,
data = apartments_test[,-1],
y = apartments_test[,1],
verbose = FALSE)
new_apartment <- apartments_test[1,]
new_apartment
wi_rf_2d <- ceteris_paribus_2d(explainer_rf, observation = new_apartment)
head(wi_rf_2d)
plot(wi_rf_2d)
plot(wi_rf_2d, add_contour = FALSE)
plot(wi_rf_2d, add_observation = FALSE)
plot(wi_rf_2d, add_raster = FALSE)
# HR data
model <- ranger(status ~ gender + age + hours + evaluation + salary, data = HR,
probability = TRUE)
pred1 <- function(m, x) predict(m, x)$predictions[,1]
explainer_rf_fired <- explain(model,
data = HR[,1:5],
y = as.numeric(HR$status == "fired"),
predict_function = pred1,
label = "fired")
new_emp <- HR[1,]
new_emp
wi_rf_2d <- ceteris_paribus_2d(explainer_rf_fired, observation = new_emp)
head(wi_rf_2d)
plot(wi_rf_2d)
Plots Ceteris Paribus Profiles
Description
Function plot.ceteris_paribus_explainer
plots Individual Variable Profiles for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Find more details in Ceteris Paribus Chapter.
Usage
## S3 method for class 'ceteris_paribus_explainer'
plot(
x,
...,
size = 1,
alpha = 1,
color = "#46bac2",
variable_type = "numerical",
facet_ncol = NULL,
facet_scales = NULL,
variables = NULL,
title = "Ceteris Paribus profile",
subtitle = NULL,
categorical_type = "profiles"
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
facet_ncol |
number of columns for the |
facet_scales |
a character value for the |
variables |
if not |
title |
a character. Plot title. By default "Ceteris Paribus profile". |
subtitle |
a character. Plot subtitle. By default |
categorical_type |
a character. How categorical variables shall be plotted? Either |
Value
a ggplot2
object
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_imputed[1,])
cp_glm
plot(cp_glm, variables = "age")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red")
selected_passangers <- select_sample(titanic_imputed, n = 1)
selected_passangers
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
plot(cp_rf) +
show_observations(cp_rf)
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age")
plot(cp_rf, variables = "class")
plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1)
plot(cp_rf, variables = c("class", "embarked"), facet_ncol = 1, categorical_type = "bars")
plotD3(cp_rf, variables = c("class", "embarked", "gender"),
variable_type = "categorical", scale_plot = TRUE,
label_margin = 70)
Plot Ceteris Paribus Oscillations
Description
This function plots local variable importance plots calculated as oscillations in the Ceteris Paribus Profiles.
Usage
## S3 method for class 'ceteris_paribus_oscillations'
plot(x, ..., bar_width = 10)
Arguments
x |
a ceteris paribus oscillation explainer produced with function |
... |
other explainers that shall be plotted together |
bar_width |
width of bars. By default |
Value
a ggplot2
object
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ranger")
apartments_rf_model <- ranger(m2.price ~., data = apartments)
explainer_rf <- explain(apartments_rf_model,
data = apartments_test[,-1],
y = apartments_test[,1],
label = "ranger forest",
verbose = FALSE)
apartment <- apartments_test[1:2,]
cp_rf <- ceteris_paribus(explainer_rf, apartment)
plot(cp_rf, color = "_ids_")
vips <- calculate_oscillations(cp_rf)
vips
plot(vips)
Plots Feature Importance
Description
This function plots variable importance calculated as changes in the loss function after variable drops.
It uses output from feature_importance
function that corresponds to
permutation based measure of variable importance.
Variables are sorted in the same order in all panels.
The order depends on the average drop out loss.
In different panels variable contributions may not look like sorted if variable
importance is different in different in different models.
Usage
## S3 method for class 'feature_importance_explainer'
plot(
x,
...,
max_vars = NULL,
show_boxplots = TRUE,
bar_width = 10,
desc_sorting = TRUE,
title = "Feature Importance",
subtitle = NULL
)
Arguments
x |
a feature importance explainer produced with the |
... |
other explainers that shall be plotted together |
max_vars |
maximum number of variables that shall be presented for for each model.
By default |
show_boxplots |
logical if |
bar_width |
width of bars. By default |
desc_sorting |
logical. Should the bars be sorted descending? By default TRUE |
title |
the plot's title, by default |
subtitle |
the plot's subtitle. By default - |
Details
Find more details in the Feature Importance Chapter.
Value
a ggplot2
object
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
fi_rf <- feature_importance(explain_titanic_glm, B = 1)
plot(fi_rf)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
fi_rf <- feature_importance(explain_titanic_rf)
plot(fi_rf)
HR_rf_model <- ranger(status~., data = HR, probability = TRUE)
explainer_rf <- explain(HR_rf_model, data = HR, y = HR$status,
verbose = FALSE, precalculate = FALSE)
fi_rf <- feature_importance(explainer_rf, type = "raw", max_vars = 3,
loss_function = DALEX::loss_cross_entropy)
head(fi_rf)
plot(fi_rf)
HR_glm_model <- glm(status == "fired"~., data = HR, family = "binomial")
explainer_glm <- explain(HR_glm_model, data = HR, y = as.numeric(HR$status == "fired"))
fi_glm <- feature_importance(explainer_glm, type = "raw",
loss_function = DALEX::loss_root_mean_square)
head(fi_glm)
plot(fi_glm)
Plots Ceteris Paribus Profiles in D3 with r2d3 Package.
Description
Function plotD3.ceteris_paribus_explainer
plots Individual Variable Profiles for selected observations.
It uses output from ceteris_paribus
function.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Find more details in Ceteris Paribus Chapter.
Usage
plotD3(x, ...)
## S3 method for class 'ceteris_paribus_explainer'
plotD3(
x,
...,
size = 2,
alpha = 1,
color = "#46bac2",
variable_type = "numerical",
facet_ncol = 2,
scale_plot = FALSE,
variables = NULL,
chart_title = "Ceteris Paribus Profiles",
label_margin = 60,
show_observations = TRUE,
show_rugs = TRUE
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Set width of lines |
alpha |
a numeric between |
color |
a character. Set line color |
variable_type |
a character. If "numerical" then only numerical variables will be plotted. If "categorical" then only categorical variables will be plotted. |
facet_ncol |
number of columns for the |
scale_plot |
a logical. If |
variables |
if not |
chart_title |
a character. Set custom title |
label_margin |
a numeric. Set width of label margins in |
show_observations |
a logical. Adds observations layer to a plot. By default it's |
show_rugs |
a logical. Adds rugs layer to a plot. By default it's |
Value
a r2d3
object.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 10)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
plotD3(cp_rf, variables = c("age","parch","fare","sibsp"),
facet_ncol = 2, scale_plot = TRUE)
selected_passanger <- select_sample(titanic_imputed, n = 1)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passanger)
plotD3(cp_rf, variables = c("class", "embarked", "gender", "sibsp"),
facet_ncol = 2, variable_type = "categorical", label_margin = 100, scale_plot = TRUE)
Plots Aggregated Ceteris Paribus Profiles in D3 with r2d3 Package.
Description
Function plotD3.aggregated_profiles_explainer
plots an aggregate of ceteris paribus profiles.
It works in a similar way to plotD3.ceteris_paribus_explainer
but, instead of individual profiles,
show average profiles for each variable listed in the variables
vector.
Find more details in Ceteris Paribus Chapter.
Usage
## S3 method for class 'aggregated_profiles_explainer'
plotD3(
x,
...,
size = 2,
alpha = 1,
color = "#46bac2",
facet_ncol = 2,
scale_plot = FALSE,
variables = NULL,
chart_title = "Aggregated Profiles",
label_margin = 60
)
Arguments
x |
a aggregated profiles explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Set width of lines |
alpha |
a numeric between |
color |
a character. Set line/bar color |
facet_ncol |
number of columns for the |
scale_plot |
a logical. If |
variables |
if not |
chart_title |
a character. Set custom title |
label_margin |
a numeric. Set width of label margins in |
Value
a r2d3
object.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
# smaller data, quicker example
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
# build a model
model_titanic_rf <- ranger(survived ~., data = titanic_small, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_small[,-8],
y = titanic_small[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_small, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
pdp_rf_p <- aggregate_profiles(cp_rf, type = "partial", variable_type = "numerical")
pdp_rf_p$`_label_` <- "RF_partial"
pdp_rf_c <- aggregate_profiles(cp_rf, type = "conditional", variable_type = "numerical")
pdp_rf_c$`_label_` <- "RF_conditional"
pdp_rf_a <- aggregate_profiles(cp_rf, type = "accumulated", variable_type = "numerical")
pdp_rf_a$`_label_` <- "RF_accumulated"
plotD3(pdp_rf_p, pdp_rf_c, pdp_rf_a, scale_plot = TRUE)
pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical")
pdp$`_label_` <- "RF_partial"
plotD3(pdp, variables = c("gender","class"), label_margin = 70)
Plot Feature Importance Objects in D3 with r2d3 Package.
Description
Function plotD3.feature_importance_explainer
plots dropouts for variables used in the model.
It uses output from feature_importance
function that corresponds to permutation based measure of feature importance.
Variables are sorted in the same order in all panels. The order depends on the average drop out loss.
In different panels variable contributions may not look like sorted if variable importance is different in different models.
Usage
## S3 method for class 'feature_importance_explainer'
plotD3(
x,
...,
max_vars = NULL,
show_boxplots = TRUE,
bar_width = 12,
split = "model",
scale_height = FALSE,
margin = 0.15,
chart_title = "Feature importance"
)
Arguments
x |
a feature importance explainer produced with the |
... |
other explainers that shall be plotted together |
max_vars |
maximum number of variables that shall be presented for for each model.
By default |
show_boxplots |
logical if |
bar_width |
width of bars in px. By default |
split |
either "model" or "feature" determines the plot layout |
scale_height |
a logical. If |
margin |
extend x axis domain range to adjust the plot.
Usually value between |
chart_title |
a character. Set custom title |
Value
a r2d3
object.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
lm_model <- lm(m2.price ~., data = apartments)
explainer_lm <- explain(lm_model,
data = apartments[,-1],
y = apartments[,1],
verbose = FALSE)
fi_lm <- feature_importance(explainer_lm,
loss_function = DALEX::loss_root_mean_square, B = 1)
head(fi_lm)
plotD3(fi_lm)
library("ranger")
rf_model <- ranger(m2.price~., data = apartments)
explainer_rf <- explain(rf_model,
data = apartments[,-1],
y = apartments[,1],
label = "ranger forest",
verbose = FALSE)
fi_rf <- feature_importance(explainer_rf, loss_function = DALEX::loss_root_mean_square)
head(fi_rf)
plotD3(fi_lm, fi_rf)
plotD3(fi_lm, fi_rf, split = "feature")
plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, scale_height = TRUE)
plotD3(fi_lm, fi_rf, max_vars = 3, bar_width = 16, split = "feature", scale_height = TRUE)
plotD3(fi_lm, margin = 0.2)
Prints Aggregated Profiles
Description
Prints Aggregated Profiles
Usage
## S3 method for class 'aggregated_profiles_explainer'
print(x, ...)
Arguments
x |
an individual variable profile explainer produced with the |
... |
other arguments that will be passed to |
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers)
head(cp_rf)
pdp_rf <- aggregate_profiles(cp_rf, variables = "age")
head(pdp_rf)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed,
probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
pdp_rf <- aggregate_profiles(cp_rf, variables = "age")
head(pdp_rf)
Prints Individual Variable Explainer Summary
Description
Prints Individual Variable Explainer Summary
Usage
## S3 method for class 'ceteris_paribus_explainer'
print(x, ...)
Arguments
x |
an individual variable profile explainer produced with the |
... |
other arguments that will be passed to |
Examples
library("DALEX")
library("ingredients")
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
# build a model
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_small,
family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_small[,-8],
y = titanic_small[,8])
cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,])
cp_glm
library("ranger")
apartments_rf_model <- ranger(m2.price ~., data = apartments)
explainer_rf <- explain(apartments_rf_model,
data = apartments_test[,-1],
y = apartments_test[,1],
label = "ranger forest",
verbose = FALSE)
apartments_small <- select_sample(apartments_test, 10)
cp_rf <- ceteris_paribus(explainer_rf, apartments_small)
cp_rf
Print Generic for Feature Importance Object
Description
Print Generic for Feature Importance Object
Usage
## S3 method for class 'feature_importance_explainer'
print(x, ...)
Arguments
x |
an explanation created with |
... |
other parameters. |
Value
a data frame.
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
verbose = FALSE)
fi_glm <- feature_importance(explain_titanic_glm)
fi_glm
Select Subset of Rows Closest to a Specified Observation
Description
Function select_neighbours
selects subset of rows from data set.
This is useful if data is large and we need just a sample to calculate profiles.
Usage
select_neighbours(
observation,
data,
variables = NULL,
distance = gower::gower_dist,
n = 20,
frac = NULL
)
Arguments
observation |
single observation |
data |
set of observations |
variables |
names of variables that shall be used for calculation of distance.
By default these are all variables present in |
distance |
the distance function, by default the |
n |
number of neighbors to select |
frac |
if |
Details
Note that select_neighbours()
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
Value
a data frame with selected rows
Examples
library("ingredients")
new_apartment <- DALEX::apartments[1,]
small_apartments <- select_neighbours(new_apartment, DALEX::apartments_test, n = 10)
new_apartment
small_apartments
Select Subset of Rows
Description
Function select_sample
selects subset of rows from data set.
This is useful if data is large and we need just a sample to calculate profiles.
Usage
select_sample(data, n = 100, seed = 1313)
Arguments
data |
set of observations. Profile will be calculated for every observation (every row) |
n |
number of observations to select. |
seed |
seed for random number generator. |
Details
Note that select_subsample()
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
Value
a data frame with selected rows
Examples
library("ingredients")
small_apartments <- select_sample(DALEX::apartments_test)
head(small_apartments)
Adds a Layer with Aggregated Profiles
Description
Function show_aggregated_profiles
adds a layer to a plot created
with plot.ceteris_paribus_explainer
.
Usage
show_aggregated_profiles(
x,
...,
size = 0.5,
alpha = 1,
color = "#371ea3",
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
Value
a ggplot2
layer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
selected_passangers <- select_sample(titanic_imputed, n = 100)
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8])
cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers)
pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age")
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_aggregated_profiles(pdp_rf, size = 3)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf
pdp_rf <- aggregate_profiles(cp_rf, type = "partial", variables = "age")
head(pdp_rf)
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red") +
show_aggregated_profiles(pdp_rf, size = 3)
Adds a Layer with Observations to a Profile Plot
Description
Function show_observations
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Usage
show_observations(
x,
...,
size = 2,
alpha = 1,
color = "#371ea3",
variable_type = "numerical",
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
variables |
if not |
Value
a ggplot2
layer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explainer_rf <- explain(rf_model,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explainer_rf, selected_passangers)
cp_rf
plot(cp_rf, variables = "age", color = "grey") +
show_observations(cp_rf, variables = "age", color = "black") +
show_rugs(cp_rf, variables = "age", color = "red")
Adds a Layer with Profiles
Description
Function show_profiles
adds a layer to a plot created with
plot.ceteris_paribus_explainer
.
Usage
show_profiles(
x,
...,
size = 0.5,
alpha = 1,
color = "#371ea3",
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
Value
a ggplot2
layer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
selected_passangers <- select_sample(titanic_imputed, n = 100)
selected_john <- titanic_imputed[1,]
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "glm", verbose = FALSE)
cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers)
cp_rf_john <- ceteris_paribus(explain_titanic_glm, selected_john)
plot(cp_rf, variables = "age") +
show_profiles(cp_rf_john, variables = "age", size = 2)
library("ranger")
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf_john <- ceteris_paribus(explain_titanic_rf, selected_john)
cp_rf
pdp_rf <- aggregate_profiles(cp_rf, variables = "age")
head(pdp_rf)
plot(cp_rf, variables = "age") +
show_observations(cp_rf, variables = "age") +
show_rugs(cp_rf, variables = "age", color = "red") +
show_profiles(cp_rf_john, variables = "age", color = "red", size = 2)
Adds a Layer with Residuals to a Profile Plot
Description
Function show_residuals
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Note that the y
argument has to be specified in the ceteris_paribus
function.
Usage
show_residuals(
x,
...,
size = 0.75,
alpha = 1,
color = c(`TRUE` = "#8bdcbe", `FALSE` = "#f05a71"),
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variables |
if not |
Value
a ggplot2
layer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
library("ranger")
johny_d <- data.frame(
class = factor("1st", levels = c("1st", "2nd", "3rd", "deck crew", "engineering crew",
"restaurant staff", "victualling crew")),
gender = factor("male", levels = c("female", "male")),
age = 8,
sibsp = 0,
parch = 0,
fare = 72,
embarked = factor("Southampton", levels = c("Belfast", "Cherbourg", "Queenstown", "Southampton"))
)
model_titanic_rf <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explain_titanic_rf <- explain(model_titanic_rf,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
johny_neighbours <- select_neighbours(data = titanic_imputed,
observation = johny_d,
variables = c("age", "gender", "class",
"fare", "sibsp", "parch"),
n = 10)
cp_neighbours <- ceteris_paribus(explain_titanic_rf,
johny_neighbours,
y = johny_neighbours$survived == "yes",
variable_splits = list(age = seq(0,70, length.out = 1000)))
plot(cp_neighbours, variables = "age") +
show_observations(cp_neighbours, variables = "age")
cp_johny <- ceteris_paribus(explain_titanic_rf, johny_d,
variable_splits = list(age = seq(0,70, length.out = 1000)))
plot(cp_johny, variables = "age", size = 1.5, color = "#8bdcbe") +
show_profiles(cp_neighbours, variables = "age", color = "#ceced9") +
show_observations(cp_johny, variables = "age", size = 5, color = "#371ea3") +
show_residuals(cp_neighbours, variables = "age")
Adds a Layer with Rugs to a Profile Plot
Description
Function show_rugs
adds a layer to a plot created with
plot.ceteris_paribus_explainer
for selected observations.
Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
Usage
show_rugs(
x,
...,
size = 0.5,
alpha = 1,
color = "#371ea3",
variable_type = "numerical",
sides = "b",
variables = NULL
)
Arguments
x |
a ceteris paribus explainer produced with function |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
variable_type |
a character. If |
sides |
a string containing any of "trbl", for top, right, bottom, and left. Passed to geom rug. |
variables |
if not |
Value
a ggplot2
layer
References
Explanatory Model Analysis. Explore, Explain, and Examine Predictive Models. https://ema.drwhy.ai/
Examples
library("DALEX")
library("ingredients")
titanic_small <- select_sample(titanic_imputed, n = 500, seed = 1313)
# build a model
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_small,
family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_small[,-8],
y = titanic_small[,8])
cp_glm <- ceteris_paribus(explain_titanic_glm, titanic_small[1,])
cp_glm
library("ranger")
rf_model <- ranger(survived ~., data = titanic_imputed, probability = TRUE)
explainer_rf <- explain(rf_model,
data = titanic_imputed[,-8],
y = titanic_imputed[,8],
label = "ranger forest",
verbose = FALSE)
selected_passangers <- select_sample(titanic_imputed, n = 100)
cp_rf <- ceteris_paribus(explainer_rf, selected_passangers)
cp_rf
plot(cp_rf, variables = "age", color = "grey") +
show_observations(cp_rf, variables = "age", color = "black") +
show_rugs(cp_rf, variables = "age", color = "red")