Law and Autonomous Systems Series: Automated Decisions Based on Profiling - Information, Explanation or Justification? That is the Question!
Within the academic community, the EU General Data Protection Regulation (GDPR) has triggered a lively debate regarding whether data subjects have a “right to explanation” of automated decisions made about them. At one end of the spectrum, we see scholars arguing that no such right exists under the GDPR but rather a “limited right to information” only. Others argue that this position is based on a very narrow reading of the relevant provisions of the GDPR, and that a contextual interpretation shows that the GDPR does indeed provide for a right to explanation with respect to automated decisions. We wholeheartedly agree with the latter interpretation and set out why below. That being said, we think that all sides to the debate are missing the broader context.
Providing upfront information on automated decision-making and the underlying logic to individuals or an explanation to individuals of automated decisions after these are made is one thing; the GDPR’s accountability requirement requires that controllers are able to demonstrate compliance with their material obligations under the GDPR, in particular that their processing of personal data meets the requirements of:
- lawfulness, fairness and transparency;
- data accuracy;
- purpose limitation;
- data minimization and storage limitation;
- automated decision-making (which requires establishing appropriate safeguards); and
- performing a data protection impact assessment.
Applying these requirements to automated decision-making requires controllers to be able to demonstrate that the correlations applied in the algorithm as “rules” for decision‑making are meaningful (e.g., no over-reliance on correlations without proven causality) and unbiased (not discriminatory) and are therefore a legitimate justification for the automated decisions about individuals. We recall that transparency to individuals and the right of individuals to, for example, access their data, primarily serve the purpose of enabling individuals to decide whether to exercise their (other) rights, such as objecting to profiling, requesting erasure or rectification of their profile or “contesting” any automated decisions relating to them. The accountability principle requires controllers to (subsequently) demonstrate their compliance with their material GDPR obligations. The question of whether the GDPR does or does not provide individuals with a right to an explanation of automated decisions relating to them is therefore missing the point that, in the end, controllers must be able to show that the correlations applied in the algorithm can legitimately be used as a justification for the automated decisions.
To give a very simple example, an explanation for the underlying logic of a decision may be that the relevant individual is from a specific ethnic minority. The individual may then contest this decision as being discriminatory. The controller will subsequently have to demonstrate that using this “rule” for the relevant decision does not constitute unlawful discrimination to be able to continue such processing.
To meet their obligations with regard to automated decision-making, controllers will need to design, develop and apply their algorithms in a transparent, predictable and verifiable manner (coined “algorithmic accountability” by Diakopoulos and Friedler). In this sense, “The algorithm did it” is not an acceptable excuse. In the words of Diakopoulos and Friedler: “Algorithmic accountability implies an obligation to report and justify algorithmic decision-making and to mitigate any negative social impacts or potential harms.”
These concerns are not limited to EU law. The U.S. Federal Trade Commission has issued recommendations that promote similar principles of lawfulness and fairness when using algorithms in decision-making, and U.S. scholars have addressed the issue that automated decision-making in the employment context may result in a disparate impact for protected classes, which may be violating U.S. anti-discrimination laws. Some of these scholars argue that this requires assessing and addressing potential disparate impact upfront as an anti-discrimination measure. For companies to be able to fend off a disparate impact claim, they must be able to show that the disparate impact is justifiable and not unlawful. This requires indeed assessing and addressing potential disparate impact upfront at the start of the development of the automated decision-making system.
Similarly, in the EU, individuals can dispute an automated decision relating to them as being unfair, e.g., because it is discriminatory. If the controller is unable to show that the correlations applied in the algorithm are meaningful, unbiased and a legitimate justification for the relevant decision, the data protection authorities (DPAs) will likely start an investigation into the decision rules applied by the algorithm.
In the words of the Norwegian DPA in its report on artificial intelligence and privacy:
“An organization must be able to explain and document, and in some cases, demonstrate, that they process personal data in accordance with the rules (…) If the DPA suspects that the account given by an organisation is wrong or contains erroneous information, it can ask the organisation to verify the details of its routines and assessments, for example by having the organisation demonstrate how their system processes personal data. This may be necessary when, for example, there is a suspicion that an algorithm is using data that the organisation has no basis for processing, or if there is a suspicion that the algorithm is correlating data that will lead to a discriminatory result.”
What is the issue: information or explanation?
The right to information
The GDPR (Articles 13(2)(f) and 14(2)(g)) explicitly requires controllers using personal data to make automated decisions to:
- inform the individuals upfront about the automated decision-making activities; and
- provide the individuals with meaningful information about the logic involved, the significance of the decision-making and the envisaged consequences for those individuals.
“Meaningful information about the logic involved”
In its Opinion on Automated Decision-Making and Profiling, the Article 29 Working Party (WP29) acknowledges that the “growth and complexity of machine-learning can make it challenging to understand how automated decision-making process or profiling works,” but that, despite this, “the company should find simple ways to tell the individual about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm.”
We note that for the controller to be able to inform the individual about the criteria relied on for automated decision-making, we require the controller to know what these criteria are in the first place. In other words, to that extent, the algorithm may not be a “black box.”
The right to an explanation
Article 22(3) GDPR requires a controller to implement suitable safeguards when designing automated decisions, which should include at least the right to obtain human intervention, to express his or her point of view and to contest the decision. Recital 71 GDPR mentions an extra safeguard: the right to an explanation of a specific automated decision.
The authors who claim that Article 22 GDPR does not provide the right to an explanation point out that this right is only included in the GDPR’s preamble and not in its articles. As confirmed by the European Court of Justice (ECJ), the preamble indeed has no legally binding force (see Case C-134/08, “Hauptzollamt Bremen v J.E. Tyson Parketthandel GmbH hanse j., , paragraph 16). However, the ECJ explains this does not deprive the preamble of all meaning; it merely prohibits the use of the preamble to interpret a provision in a manner clearly contrary to its wording (see Case C-308/97, Manfredi v Puglia , paragraph 30).
Article 22(3) GDPR specifies the safeguards that must at least be included in the design of automated decisions. This wording quite clearly leaves room for adopting other safeguards, such as the right to an explanation of a specific automated decision mentioned in Recital 71.
This view is supported by both the WP29 in its Opinion on Automated Decision-Making and Profiling and the Norwegian DPA in its report on Artificial Intelligence and Privacy. In the words of the latter:
“Regardless of what the differences in language mean [authors: whether Article 22 GDPR provides the right to an explanation or not], the controller must provide as much information as necessary in order for the data subject to exercise his or her rights. This means that the decision must be explained in such a way that the data subject is able to understand the result.
The right to an explanation does not necessarily mean that the black box must be opened, but the explanation has to enable the data subject to understand why a particular decision was reached, or what needs to change in order for a different decision to be reached.”
The latter is also known as a “counterfactual explanation,” described by Wachter, Mittelstadt and Russell. A counterfactual explanation could be, for an individual whose application for a loan has been denied and who wants to know why, that the income statements provided by the individual show a yearly income of EUR 50,000, and the loan would be granted with yearly income of EUR 60,000 or more.
Again, in order for the controller to explain the decision in such a way that the individual understands the result (and knows what to change to get a different result), the controller needs to know what the “rules” are that led to the relevant decision (i.e., the algorithm may not be a “black box”).
Algorithmic accountability requires “white-box” development
Although it is far from set in stone what “white-box” development would require, there are some guidelines to take into account when developing algorithms for automated decision-making (see guidelines on white-box development, below). By documenting these steps and assessments, the controller will also comply with the requirement to perform a data protection impact assessment.
In the words of the WP29:
“Controllers should carry out frequent assessments on the data sets they process to check for any bias, and develop ways to address any prejudicial elements, including any over-reliance on correlations.
Systems that audit algorithms and regular reviews of the accuracy and relevance of automated decision-making including profiling are other useful measures. Controllers should introduce appropriate procedures and measures to prevent errors, inaccuracies or discrimination on the basis of special category data. These measures should be used on a cyclical basis; not only at the design stage, but also continuously, as the profiling is applied to individuals. The outcome of such testing should feed back into the system design.”
Conclusion: information, explanation or justification?
This article discusses the obligations of controllers with regard to automated decision‑making; must they provide information, an explanation or a justification? The answer is: all three. The main underlying rationales of EU data protection laws are preventing information inequality and information injustice. These rationales can only be served if controllers cannot hide behind algorithms for automated individual decision-making. Controllers will be accountable for the outcome and will therefore have to be able to ultimately justify the criteria based on which automated decision-making takes place. As indicated at the start, we therefore think the academic debate on the rights of individuals alone misses the bigger picture, with the risk that companies do the same.
Guidelines on White-Box Development:
- A clear, documented design for development at the outset (covering the elements below).
- Verification from the outset that the dataset applied for the training of the algorithm is:
- Representative (no missing information from particular populations and verification that there are no hidden unlawful biases that are having an unintended impact on certain populations).
- Accurate and up to date (data collected in another context may be up-to-date but still lead to inaccurate outcomes). Note that using an existing, unmodified, dataset is likely to result in unlawful bias, simply because current situations are rarely unbiased, and this existing bias is rarely lawful. For example, using a dataset of all primary school teachers in the Netherlands will result in an unlawful bias because the algorithm will determine that women are better qualified for this job than men because women are overrepresented in the data set. Unlawful bias can be removed from a data set by, e.g.:
- Removing data elements that indicate group membership and near proxies thereof. These data elements include direct identifiers of group membership, such as gender, race, religion and sexual orientation. Proxy identifiers may, e.g., be neighbourhood (often proxy for race) or specific job titles (nurse and navy officer).
- Decide on the target variables before starting to select the training data. The controller needs to decide upfront which variables are thought relevant for the relevant selection. If, for example, for recruitment purposes personality traits are included in the selection, such traits must be important enough to job performance to justify their use. Even if automated feature selection methods are used, the final decision to use or not use the results, as well as the choice of feature selection method and any fine-tuning of its parameters, are choices made by humans. These variables need to be documented and must “pass the smell test,” i.e., they must be intuitively relevant and important enough to job performance to be used. For example, a correlation between job applicants using browsers that did not come with the computer (like Firefox) and better job performance and retention will likely not be acceptable.
- Adding or modifying elements that result in an unlawful bias. Instead of deleting group membership, group membership indicators can also be modified. For example, in the group of primary school teachers, the gender of a specific number of teachers can be reversed to remove bias. Alternatively, if a certain minority is underrepresented in the data set, this can be compensated by oversampling these underrepresented communities.
- Repairing of attributes. An example of an attribute that is often biased are SAT scores (research shows that SAT scores are often biased against women due to negative assumptions about the abilities of women, and the resulting stereotyping tends to have a real effect on the outcomes). This can be remedied by splitting the group of scores achieved by men and women and dividing each into quantiles (e.g., top 5%). Then a median score can be calculated for each quantile and attributed to both women and men in such quantile.
- Review the outcome of the algorithm (and correlations found) at set stages for unlawful bias and disparate impact and, where present, remove this:
- Justifiable correlations. Not all correlations found by an algorithm are meaningful, nor can they legitimately be used as a justification for outcomes.
- Consider whether the algorithm can be used in ways that prevent unlawful discrimination. In the recruitment context, consider, for example, blind curation of CVs, e.g., eliminate names, gender, school names and geographical information from the CV before selection of a relevant candidate pool.
- Ensuring auditability of the algorithm.
Lokke Moerel is Senior Of Counsel at Morrison & Foerster in Berlin, and Professor of Global ICT Law at Tilburg University.
Marijn Storm is an Associate at Morrison & Foerster in Brussels.
YOU MAY ALSO BE INTERESTED IN