Reduan Achtibat, Wojciech Samek,
Sebastian Lapuschkin
Berlin - In 2015, the U.S. Defense Advanced Research Projects Agency (DARPA) launched the “Explainable Artificial Intelligence” program (XAI). Its goal was to give users a better understanding of AI systems, greater trust in them, and more effective control over their behavior. That same year, researchers at the Technical University of Berlin (TU Berlin)/BIFOLD and the Fraunhofer Heinrich Hertz Institute (HHI) developed the technique known as Layer-wise Relevance Propagation (LRP)—the first method to systematically explain the decisions of neural networks. Today, XAI techniques are not only widespread in research and industry, but also form part of regulatory frameworks—such as the “Right to Explanation” in the GDPR and transparency requirements in the EU AI Act. National AI Day on July 16 offers a good moment to reflect on the past and look ahead: What role do Berlin-based institutions like TU Berlin, Fraunhofer HHI, and BIFOLD play?
The desire to peer inside the AI “black box” to understand what it has learned and how it makes decisions is nearly as old as AI research itself. With the rise of deep learning in 2012, a paradigm shift occurred: AI models became incredibly complex, and their inner workings increasingly opaque. Today, large language models in particular—some with 10, 100, or more billion trainable parameters—have reached a level of complexity that exceeds that of the human brain. Researchers at TU Berlin, Fraunhofer HHI, and BIFOLD have played a pivotal role in shaping the field of Explainable AI from the very beginning.
Explaining Individual Predictions (2012–2018)
Layer-wise Relevance Propagation (LRP) was developed in 2015 by a team led by BIFOLD co-director Prof. Dr. Klaus-Robert Müller, head of the Machine Learning Group at TU Berlin, and Prof. Dr. Wojciech Samek, head of the Artificial Intelligence Department at HHI, professor at TU Berlin, and BIFOLD Fellow. “Our goal was to make individual model decisions transparent and interpretable. The method uses so-called heatmaps to visualize which input features (e.g., image pixels or words in a text) contributed to the model’s prediction and to what extent,” explains Wojciech Samek. At the time, LRP was the first general method for systematically explaining neural decisions—and it is still used worldwide. The core idea is that the model’s decision is propagated backward through the network, with neurons that contributed more to the prediction receiving higher relevance scores. The method is extremely efficient and can be applied even to large language models with billions of parameters. In practice, such explanations can uncover problematic behavior. For example, it was shown that some models, despite strong performance, did not truly “understand” their tasks but instead “cheated” in surprisingly effective ways. One well-known case: an image classification model learned to identify horse images not based on the animals themselves, but on a copyright watermark frequently present in such pictures.
Understanding the Model (2018–2023)
The second wave of XAI research aimed to understand the internal mechanisms of AI models—not just what they respond to, but how they function internally and what concepts they have learned. The research teams at TU Berlin, Fraunhofer HHI, and BIFOLD developed a number of novel analysis tools for this purpose. Notable contributions include Concept Relevance Propagation (CRP) and Disentangled Relevant Subspace Analysis (DRSA). Both methods build on LRP, but significantly expand it: Instead of only analyzing input relevance, they also investigate the roles of individual neurons and neural substructures. In practice, CRP, for example, made it possible to visualize the concepts learned by a model trained to classify Alzheimer’s disease from quantitative MRI data. CRP could identify and visualize the concepts the neural network associated with the classification of “diseased” or “healthy” and compare them to medically known brain regions. This form of explainability is essential not only for deploying AI in medicine but also for making its decisions scientifically interpretable.
Holistic Understanding (2023–Present)
“Today, the goal is to gain a systematic and comprehensive understanding of AI models, their behavior, and their internal representations. In this context, we recently released ‘SemanticLens’—a system that aims to make the function and quality of each individual model component, i.e., each neuron, transparent,” explains Wojciech Samek. “The concept is perhaps best compared to a complex technical system like an Airbus A340-600: the Airbus consists of over four million individual parts. Aircraft engineers must understand and document the function and reliability of each part to ensure the system as a whole can be checked and trusted. In contrast, the role of individual neurons in AI models has remained largely unclear, making automated validation and reliability assessments difficult.” SemanticLens closes precisely this gap. For the first time, it enables systematic and novel forms of analysis and validation. This new quality of model inspection marks a decisive step toward auditable, trustworthy, and controllable AI systems—especially in safety-critical domains.
What Will the Next 10 Years Bring?
“In the coming years, the focus of XAI will shift: away from purely post-hoc analyses—explanations after the fact—toward interactive, integrative approaches that treat explainability as a core element of human-AI interaction,” says Wojciech Samek. New and pressing questions are emerging: What kind of explanations are helpful to users in which contexts? What should an explainable interface look like? And how can information flow from humans back into the model be designed to prevent misunderstandings or allow intervention? Explainability is evolving from a diagnostic tool into an active control mechanism—an essential step for the responsible and safe use of modern AI systems. Another promising research direction lies in the scientific use of explainable AI: using explanatory models to gain insights in the natural sciences, life sciences, and humanities. Researchers at TU Berlin, Fraunhofer HHI, and BIFOLD have already made important contributions in fields such as cancer research, quantum chemistry, and historical studies.
Publication: DOI 10.48550/arXiv.2501.05398
Interactive demo: https://www.hhi.fraunhofer.de/en/departments/ai/technologies-and-solutions/semanticlens.html
About Fraunhofer HHI:
Fraunhofer HHI (Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute) is a world-leading research institute that is helping to shape the digital future. It drives innovation in the fields of video, AI, computer vision, photonics and wireless communication - with technologies that have a significant impact on science, business and society.
Fraunhofer HHI develops practical solutions with added social value for a wide range of application areas - including medicine, agriculture, critical infrastructures, disaster control, energy, mobility and more.