Toward practical human-interpretable explanations

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Model-agnostic feature attribution techniques are used to explain the decisions of complex machine learning (ML) models including ensemble models, and deep neural networks (DNNs). However, since complex ML models perform best when trained on low-level features, the explanations generated by these algorithms are often not interpretable or usable by humans. Recently proposed model-agnostic methods that support the generation of human-interpretable explanations are impractical because they require a fully invertible transformation function that maps the model’s input features to human-interpretable features. While some practical human-interpretable explainability methods exist (e.g., concept-based methods), they typically require direct access to the model and are not fully model-agnostic. In this paper, we introduce Latent SHAP, a model-agnostic black-box feature attribution framework that provides human-interpretable explanations without necessitating a fully invertible transformation function. We validate the fidelity of Latent SHAP ’s explanations through quantitative faithfulness assessments on two controlled datasets—a self-generated artificial dataset and the dSprites dataset. Furthermore, we showcase the practical utility of Latent SHAP in various real-world scenarios across domains such as computer vision, natural language processing, and cybersecurity. Each domain involves complex models (ensembles, DNNs, and LLMs), where invertible transformation functions are not available.

Original languageEnglish
Article number209
JournalMachine Learning
Volume114
Issue number9
DOIs
StatePublished - 1 Sep 2025

Keywords

  • Explainability
  • Explainable ML
  • Machine learning
  • XAI algorithms

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Toward practical human-interpretable explanations'. Together they form a unique fingerprint.

Cite this