Layer-wise Relevance Propagation

The research of the eXplainable AI group fundamentally focuses on the algorithmic development of methods to understand and visualize the predictions of state-of-the-art AI models. In particular our group, together with collaborators from TU Berlin and the University of Oslo, introduced in 2015 a new method to explain the predictions of deep convolutional neural networks (CNNs) and kernel machines (SVMs) called Layer-wise Relevance Propagation (LRP)1. This method has since been extended to other types of AI models such as Recurrent Neural Networks (RNNs)2, One-Class Support Vector Machines3, and K-Means clustering4. The LRP method is based on a relevance conservation principle and leverages the structure of the model to decompose its prediction.

First, a standard forward pass through the model is performed, with which the network predicts. Then, the model's output is backward propagated layer-by-layer, following the neural pathways involved in the final prediction, by applying specific LRP decomposition rules. The result is a heatmap indicating the contribution of individual input features (e.g., of pixels) to the prediction, which can be computed for any hypothetically possible prediction. LRP has a theoretical justification which can be rooted in Taylor decomposition5. Besides the development of new explanation methods our group also investigates quantitative methods to evaluate and validate explanations, e.g. via pixel-perturbation analysis6 or controlled tasks with ground truth annotations7.

To explore the capabilities of LRP in various data domains and problem scales in action, visit our interactive demos.

Further, LRP constitutes a reliable basis in the exploration of semi-automated techniques to inspect explanations at large scale and identify undesirable behaviors of machine learning models (so-called Clever-Hans behaviors) with Spectral Relevance Analysis8, with the intent to ultimately un-learn such behaviors9 and functionally clean the model.