Benchmarking perturbation-based saliency maps for explaining Atari agents

Huber, Tobias; Limmer, Benedikt; André, Elisabeth

doi:10.3389/frai.2022.903875

One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different perturbation-based saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy, which includes long-term decision making. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the expected future reward after doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2,600 games. The approaches areOne of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different perturbation-based saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy, which includes long-term decision making. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the expected future reward after doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2,600 games. The approaches are compared using two computational metrics: dependence on the learned parameters of the underlying deep Q-network of the agents (sanity checks) and fidelity to the agents' reasoning (input degradation). During the sanity checks, we found that a popular noise-based saliency map approach for DRL agents shows little dependence on the parameters of the output layer. We demonstrate that this can be fixed by tweaking the algorithm such that it focuses on specific actions instead of the general entropy within the output values. For fidelity, we identify two main factors that influence which saliency map approach should be chosen in which situation. Particular to value-based DRL agents, we show that analyzing the agents' choice of action requires different saliency map approaches than analyzing the agents' state value estimation.… show more

Author:	Tobias Huber ORCiD GND, Benedikt Limmer, Elisabeth André ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-924812
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/92481
ISSN:	2624-8212OPAC
Parent Title (English):	Frontiers in Artificial Intelligence
Publisher:	Frontiers Media S.A.
Type:	Article
Language:	English
Date of Publication (online):	2022/02/04
Year of first Publication:	2022
Publishing Institution:	Universität Augsburg
Release Date:	2022/02/04
Volume:	5
First Page:	903875
DOI:	https://doi.org/10.3389/frai.2022.903875
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung

Open Access

Benchmarking perturbation-based saliency maps for explaining Atari agents

Download full text files

Export metadata

Statistics

Additional Services