Technical Papers

Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Giorgio Visania, Enrico Bagli, Federico Chesani, Alessandro Poluzzi and Davide Capuzzo

ABSTRACT of Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Nowadays we are witnessing a transformation of the business processes towards a more computation driven approach. The ever increasing usage of Machine Learning techniques is the clearest example of such trend. This sort of revolution is often providing advantages, such as an increase in prediction accuracy and a reduced time to obtain the results. However, these methods present a major drawback: it is very difficult to understand on what grounds the algorithm took the decision. To address this issue we consider the LIME method. We give a general background on LIME then, we focus on the stability issue: employing the method repeated times, under the same conditions, may yield to different explanations. Two complementary indices are proposed, to measure LIME stability. It is important for the practitioner to be aware of the issue, as well as to have a tool for spotting it. Stability guarantees LIME explanations to be reliable therefore a stability assessment, made through the proposed indices, is crucial. As a case study, we apply both Machine Learning and classical statistical techniques to Credit Risk data. We test LIME on the Machine Learning algorithm and check its stability. Eventually, we examine the goodness of the explanations returned.

Introduction to Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Nowadays, more and more interest is devoted to the concept of "learning from the data", i.e. using the data collected about the process to predict its outcome (Hastie, Tibshirani, & Friedman, 2009). The main ingredients of its recent success are the huge availability of data sources and the increased computational power, which allows complex algorithms to deliver results in a relatively short time.

In statistics, making predictions about the future is a particularly relevant topic. To address the subject, simple algorithms and methods have been developed over the years, the most famous being Linear Regression and Generalised Linear Models (Greene, 2003). However, with the advent of powerful computing tools, more sophisticated techniques have been developed. In particular, Machine Learning models are able to perform intelligent tasks usually done by humans, supporting the automation of data driven processes.

Despite the enhanced accuracy, Machine Learning models display weakness especially when it comes to interpretability, i.e. \the ability to explain or to present the results, in understandable terms, to a human" (Hall & Gill, 2018). They usually adopt large model structures and refine the prediction using a huge number of iterations. The logic underlying the model ends up hidden under potentially many strata of mathematical calculations, as well as scattered across a too vast architecture, preventing humans from grasping it.

To achieve the interpretability, quite a few techniques have been proposed in re- cent literature. These approaches can be grouped based on different criteria Molnar (2020a), Guidotti et al. (2018) such as i) Model agnostic or model specific ii) Local, global or example based iii) Intrinsic or post-hoc iv) Perturbation or saliency based. Herein, we focus on LIME (Local Interpretable Model-agnostic Explanations), a local interpretability framework, developed by Ribeiro, Singh, & Guestrin, 2016.

The technique may suffer from a lack of stability, namely repeated applications of the method under the same conditions may obtain different results. This is a particularly delicate issue however it is rarely taken into consideration. Even worse, many times the issue is not spotted at all, e.g. when just a single call to the method is done and the result is considered to be okay without further checks. In this paper, we introduce a pair of complementary stability indices, useful to measure LIME stability and spot potential issues. They represent an innovative contribution to the scientific community, addressing an important research question.

The indices are calculated on repeated calls of the method, to evaluate the similarity of the results. They may be applied on every trained LIME method and will allow the practitioner to be aware about potential instability of the results, otherwise to ensure that the trained method is consistent.

Hereafter, a brief introduction on the explainability techniques is presented in Chapter 2. The LIME technique is exhaustively analysed in Chapter 4, including its weak points. A thorough discussion about LIME stability can be found in Chapter 4, along with a description of some recent works tackling the issue. Our proposition is extensively discussed in Chapter 5. Eventually, a practical application of the method in the Credit Risk Modelling field is shown in Chapter 6. Chapter 7 is dedicated to Discussion and Conclusions. The code used for the experiments is available at https://github.com/giorgiovisani/LIME stability.

ARE YOU A DEVELOPER?

Check out all the resources for TPPs and developers on the Crif Platform development portal.

REQUEST YOUR FREE COPY

PRIVACY POLICY PURSUANT TO ART. 13 OF EU REGULATION 679/2016 (“GDPR”)

In accordance with the legislation in force on the protection of personal data, CRIF S.p.A., located at Via Fantin 1-3, 40131 Bologna, Italy, VAT No. 02083271201 (“CRIF”), as the Controller for the processing of your personal data, must provide you with certain information concerning the use of such data. 1 – Purpose of the processing of personal data and lawful basis of the processing 1.1 – Purpose and lawful basis of the processing Your personal data is processed by CRIF for the following purposes: a) for the purpose of fulfilling contact requests. Lawfulness of processing: art. 6(1)(b) of the GDPR. b) for marketing and/or information purposes, as well as market analysis and initiatives related to CRIF activities, including via automated calling systems (e.g., SMS, MMS, e-mail, fax). Lawfulness of processing: art. 6(1)(a) of the GDPR. c) purpose of sharing/transferring your data with/to CRIF Group companies (refer to link https://www.crif.it/chi-siamo/la-nostra-presenza-globale/ to fulfill contact requests. Lawfulness of processing: art. 6(1)(b) of the GDPR. The provision of personal data for the purposes referred to in point (b) is optional, and the related processing requires the consent of the data subject; any refusal to provide consent will not give rise to any consequences. The provision of data for the purposes referred to in points (a) and (c) is necessary and does not require consent. The user is free to not provide this information, but in this case we will not be able to fulfill your requests. After the initial telephone/e-mail contact, if the user decides not to subscribe to any service or to purchase any product or states that he/she does not want to be contacted again, the Controller will cancel the user’s details. Likewise, users can decide not to receive any marketing communications at any time by using the opt-out link at the bottom of each message and in any case exercising the relative right to withdraw consent. Any other processing for different purposes is excluded. 2 - Retention times 2.1 We hereby inform you that your personal data will be processed and retained for up to 5 years or in any case until you withdraw your consent. In this regard, you can withdraw consent for the processing of personal data for the purposes described in point 1.1 (b) at any time by e-mailing: dirprivacy@crif.com. 3 – Methods of data processing 3.1 Data processing is carried out using manual, computerized and ICT tools according to methods strictly related to the purposes themselves and, in any case, in a way that guarantees the confidentiality and security of the data. 4 – Categories of subjects to which personal data can be communicated or who may become aware of such data 4.1 – To achieve the purposes described in point 1.1 “Purpose and lawful basis of the processing” of this Privacy Policy, CRIF may communicate your personal data to third parties belonging to the following categories: a) personnel authorized to perform the processing, or third-party subjects appointed as processors; b) CRIF Group companies, including outside the European Union, which will act as independent controllers and will provide their own privacy notice in accordance with art. 14 of the GDPR. 5 – Transfer of data outside the European Union 5.1 To achieve the purposes described in point 1.1 letter c) “Purpose and lawful basis of the processing” of this Privacy Policy, CRIF may also communicate your personal data to CRIF Group companies based outside the European Economic Area. 5.2 The above transfer may be put in place, without specific authorizations, if the third country to which the data is transferred falls under those which guarantee an adequate level of protection according to the European Commission. In the absence of such an adequacy decision adopted by the European Commission, this transfer to recipients located in third countries can be carried out by adopting and documenting the sufficient guarantees referred to in art. 46 of the GDPR. In the absence of an adequacy decision or additional guarantees, the transfer of personal data to recipients located in third countries can be carried out if the terms are met and the additional conditions set out by Chapter V of the GDPR exist, including the possibility to make use of the derogations for specific situations in art. 49 of the GDPR. 5.3 A list of countries where CRIF Group companies operate is available at: https://www.crif.it/chi-siamo/la-nostra-presenza-globale/ 6 - Data Subject rights 6.1 According to Chapter III of the GDPR, as the Data Subject, you have the right to (i) obtain confirmation of whether personal data relating to you is being processed, obtaining the information listed in article 15 of the Regulation; (ii) obtain rectification of inaccurate personal data regarding you or to have incomplete personal data completed; (iii) obtain deletion of personal data regarding you, pursuant to and with the limitations set out in article 17 of the Regulation; (iv) obtain the restriction of processing of your personal data, in the cases specified in article 18 of the Regulation; (v) receive the personal data concerning you in a structured and machine-readable format, in the cases specified in article 20 of the Regulation; (vi) oppose the processing of personal data pursuant to and with the limitations set out in article 21 of the Regulation, even only for automated contact; and (vii) withdraw consent at any time, without prejudice to the lawfulness of the processing based on the consent given prior to the withdrawal. 7 - Controller 7.1 The Controller responsible for the processing of personal data is CRIF S.p.A., Via Mario Fantin 1‐3, 40131 Bologna, Italy, VAT No. 02083271201. A complete list of Processors is available from the Controller’s head office. The following methods can be used to exercise the rights set out in Chapter III of the GDPR: - e-mail sent to the address: dirprivacy@crif.com; - certified e-mail sent to the address: crif@pec.crif.com 7.2 You can also submit a complaint to the Italian Data Protection Authority, following the instructions via the link: http://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/4535524. 8 – Data Protection Officer 8.1 For any questions regarding the processing of your personal data, you can contact the Data Protection Officer at: e-mail: dirprivacy@crif.com: Certified e-mail: crif@pec.crif.com.