CUED Publications database

You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.

Dimanov, B and Bhatt, U and Jamnik, M and Weller, A (2020) You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods. In: ECAI European Conference on Artificial Intelligence, 2020-8-29 to 2020-9-8 pp. 63-73..

Full text not available from this repository.

Abstract

Transparency of algorithmic systems is an important area of research, which has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [23], even suggests that model expla- nations can answer the question “Why should I trust you?”. Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explana- tion methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this ex- planation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation meth- ods to check fairness of a model.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Subjects: UNSPECIFIED
Divisions: Div F > Computational and Biological Learning
Depositing User: Cron Job
Date Deposited: 29 Jan 2020 20:23
Last Modified: 15 Jul 2021 05:56
DOI: 10.3233/FAIA200380