When building complex models, it is often difficult to explain why
the model should be trusted. While global measures such as accuracy are
useful, they cannot be used for explaining why a model made a specific
prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for
explaining the outcome of black box models by fitting a local model around
the point in question an perturbations of this point. The approach is
described in more detail in the article by Ribeiro et al. (2016)
Whose models were simply sublime,
It gave explanations for their variations,
one observation at a time.
lime-rick by Mara Averick
This is an R port of the Python lime package (https://github.com/marcotcr/lime) developed by the authors of the lime (Local Interpretable Model-agnostic Explanations) approach for black-box model explanations. All credits for the invention of the approach goes to the original developers.
The purpose of lime
is to explain the predictions of black box
classifiers. What this means is that for any given prediction and any
given classifier it is able to determine a small set of features in the
original data that has driven the outcome of the prediction. To learn
more about the methodology of lime
read the
paper and visit the repository of
the original implementation.
The lime
package for R does not aim to be a line-by-line port of its
Python counterpart. Instead it takes the ideas laid out in the original
code and implements them in an API that is idiomatic to R.
Out of the box lime
supports a long range of models, e.g. those
created with caret, parsnip, and mlr. Support for unsupported models are
easy to achieve by adding a predict_model
and model_type
method for
the given model.
The following shows how a random forest model is trained on the iris
data set and how lime
is then used to explain a set of new
observations:
library(caret)library(lime)# Split up the data setiris_test <- iris[1:5, 1:4]iris_train <- iris[-(1:5), 1:4]iris_lab <- iris[[5]][-(1:5)]# Create Random Forest model on iris datamodel <- train(iris_train, iris_lab, method = 'rf')# Create an explainer objectexplainer <- lime(iris_train, model)# Explain new observationexplanation <- explain(iris_test, explainer, n_labels = 1, n_features = 2)# The output is provided in a consistent tabular format and includes the# output from the model.explanation#> # tibble [10 × 13]#> model_type case label label_prob model_r2 model_intercept#> <chr> <chr> <chr> <dbl> <dbl> <dbl>#> 1 classific… 1 seto… 1 0.340 0.263#> 2 classific… 1 seto… 1 0.340 0.263#> 3 classific… 2 seto… 1 0.336 0.259#> 4 classific… 2 seto… 1 0.336 0.259#> 5 classific… 3 seto… 1 0.361 0.258#> 6 classific… 3 seto… 1 0.361 0.258#> 7 classific… 4 seto… 1 0.364 0.247#> 8 classific… 4 seto… 1 0.364 0.247#> 9 classific… 5 seto… 1 0.343 0.256#> 10 classific… 5 seto… 1 0.343 0.256#> # ... with 7 more variables: model_prediction <dbl>, feature <chr>,#> # feature_value <dbl>, feature_weight <dbl>, feature_desc <chr>,#> # data <list>, prediction <list># And can be visualised directlyplot_features(explanation)
lime
also supports explaining image and text models. For image
explanations the relevant areas in an image can be highlighted:
explanation <- .load_image_example()plot_image_explanation(explanation)
Here we see that the second most probably class is hardly true, but is due to the model picking up waxy areas of the produce and interpreting them as wax-light surface.
For text the explanation can be shown by highlighting the important
words. It even includes a shiny
application for interactively
exploring text models:
lime
is available on CRAN and can be installed using the standard
approach:
install.packages('lime')
To get the development version, install from GitHub instead:
# install.packages('devtools')devtools::install_github('thomasp85/lime')
parsnip
and ranger
preprocess
argument to lime.data.frame
to keep it in line with the
other types. Use it to transform your data.frame into a new input that your
model expects after permutationsmagick
is now only in suggest to cut down on heavy hard dependenciesexplain
now returns a tbl_df
so you get pretty printing if you have
tibble
loadedplot_features
now has a cases
argument for subsetting the data before
plottingplot_image_explanation
(#35)keras
packageas_classifier()
and as_regressor()
for ad-hoc specification of the
model type in case the heuristic implemented in lime
doesn't hold.
as_classifier()
also lets you add/overwrite the class labels.gower
as the new default similarity measure for tabular databin_continuous = FALSE
the default behavior is now to sample from a
kernel density estimation rather than assume a normal distribution.plot_explanations()
(#60)plot_text_explanation()
with better formatting and scrolling
support for many explanationsNEWS.md
file to track changes to the package.NA
values (#8)plot_features()
(#38)h2o
(@mdancho84) (#40)NA
values (#45)Date
and POSIXt
columns. They will be kept constant during
permutations so that lime
will explain the model behaviour at the given
timepoint based on the remaining features (#39).plot_explanations()
for an overview plot of a large explanation set