Addressing Shortcomings of Explainable Machine Learning Methods
Time: Thu 2025-02-13 13.00
Location: Ka-Sal B (Peter Weisglass), Kistagången 16, Electrum, Kista
Video link: https://kth-se.zoom.us/j/66054420196
Language: English
Subject area: Information and Communication Technology
Doctoral student: Amr Alkhatib , Programvaruteknik och datorsystem, SCS
Opponent: Professor Kary Främling, Umeå universitet, Umeå, Sverige
Supervisor: Professor Henrik Boström, Programvaruteknik och datorsystem, SCS
QC 20250116
Abstract
Recently, machine learning algorithms have achieved state-of-the-art performance in real-life applications in various domains, but such algorithms tend to produce non-interpretable models. However, users often require an understanding of the reasoning behind predictions to trust the models and use them in decision-making. Therefore, explainable machine learning has gained attention as a way to achieve transparency while keeping the performance of state-of-the-art algorithms. Various methods have been proposed as a post-hoc remedy to explain the black-box models. However, such techniques are constrained in their ability to provide a comprehensive and faithful insight into the prediction process. For instance, many explanation methods based on additive importance scores generate explanations without assurance that the explanation provided reflects the model's reasoning. Other rule-based explanations can produce excessively specific explanations that occasionally exhibit poor fidelity, i.e., they lack faithfulness to the underlying black-box model. Furthermore, explanation methods are generally computationally expensive, making their application unrealistic in many real-world situations.
We aim to tackle several key limitations of explainable machine learning methods, with a focus on (i) low fidelity, (ii) the absence of validity guarantees, i.e., explaining without a pre-specified error rate, (iii) and high computational cost. Firstly, we propose a method that summarizes local explanations into a concise set of characteristic rules that can be evaluated with respect to their fidelity. We also investigate using Venn prediction to quantify the uncertainty of rule-based explanations. In addition, we propose to estimate the accuracy of approximate explanations and establish error bounds for the accuracy estimates using the conformal prediction framework. Secondly, we propose a method to approximate any score-based explanation technique using computationally efficient regression models and produce error bounds around the approximated importance scores using conformal regression. Moreover, we propose a novel method to approximate Shapley value explanations in real time, achieving high similarity to the ground truth while using a limited amount of data. Thirdly, we propose a method that restricts graph neural networks to generate inherently interpretable models, hence saving the time and resources required for post-hoc explanations while maintaining high fidelity. We also extend the graph neural networks approach to process heterogeneous tabular data. Finally, we present a method that learns a function to compute Shapley values, from which the predictions are directly obtained by summation, that is, the method can compute the Shapley values beforehand.
Empirical investigations of the proposed methods suggest that the fidelity of approximated explanations can vary based on the black-box predictor, dataset, and explanation method. The conformal prediction framework can be reliable in controlling the error level when timely explanations are required. Furthermore, constraining graph neural networks to produce inherently explainable models does not necessarily compromise predictive performance and can reduce the time and resources needed for post-hoc explanations.