Going Beyond Saliency Maps: Training Deep Models to Interpret Deep Models. Academic Article uri icon

Overview

abstract

  • Interpretability is a critical factor in applying complex deep learning models to advance the understanding of brain disorders in neuroimaging studies. To interpret the decision process of a trained classifier, existing techniques typically rely on saliency maps to quantify the voxel-wise or feature-level importance for classification through partial derivatives. Despite providing some level of localization, these maps are not human-understandable from the neuroscience perspective as they often do not inform the specific type of morphological changes linked to the brain disorder. Inspired by the image-to-image translation scheme, we propose to train simulator networks to inject (or remove) patterns of the disease into a given MRI based on a warping operation, such that the classifier increases (or decreases) its confidence in labeling the simulated MRI as diseased. To increase the robustness of training, we propose to couple the two simulators into a unified model based on conditional convolution. We applied our approach to interpreting classifiers trained on a synthetic dataset and two neuroimaging datasets to visualize the effect of Alzheimer's disease and alcohol dependence. Compared to the saliency maps generated by baseline approaches, our simulations and visualizations based on the Jacobian determinants of the warping field reveal meaningful and understandable patterns related to the diseases.

publication date

  • June 14, 2021

Identity

PubMed Central ID

  • PMC8451265

Scopus Document Identifier

  • 85111438656

Digital Object Identifier (DOI)

  • 10.1007/978-3-030-78191-0_6

PubMed ID

  • 34548772

Additional Document Info

volume

  • 12729