Video: Big Data and Discrimination

Posted:

21 December 2018

Time to read:

2 Minutes

Author(s):

Talia Gillis

Associate Professor of Law at Columbia Law School from July 2020 onwards

Jann Spiess

More From:

Talia Gillis

Jann Spiess

For many financial products, such as loans and insurance policies, companies distinguish between people based on their different risks and returns. However, the ability to distinguish between people by trying to predict future behavior or profitability of a contract is often restrained by legal rules that aim to prevent certain types of discrimination. Many of these rules were developed to challenge human discretion in setting prices and provide little guidance in a world where firms set credit terms based on sophisticated statistical methods and a large number of factors. This rise of artificial intelligence and ‘big data’ raises the question where and how existing law can be applied to this novel setting, and where it must be adapted to remain effective.

In a recent working paper, ‘Big Data and Discrimination’, with Jann Spiess, we bridge the gap between old law and new methods by proposing a framework that brings together existing legal requirements with the structure of algorithmic decision-making in order to identify tensions and lay the ground for legal solutions. Focusing on the example of credit pricing, we confront steps in the genesis of an automated pricing rule with their regulatory opportunities and challenges.

When algorithms make decisions, opaque human behavior is replaced by a set of rules constructed from data. Specifically, we consider prices that are set based on prediction of mortgage default. We connect machine learning decision rules and current law by considering the three stages of a pricing decision, which we demonstrate in a simulation exercise. First, we consider the data ‘input’ stage of the pricing decision. Second, we discuss the ‘decision process’ stage in which the data inputs are used to produce a pricing rule. And finally, we consider the ‘output’ stage which consists of the actual pricing outcomes. The data we use is based on real data on mortgage applicants using the Boston Home Mortgage Disclosure Act (HMDA) dataset, and we impute default probabilities from a combination of loan approvals and calibrate them to overall default rates. The simulated data allows us to demonstrate several of our conceptual arguments and the methodological issues we discuss.

Based on our framework, we argue that legal doctrine is ill-prepared to face the challenges posed by algorithmic decision-making in a big-data world. While automated pricing rules promise increased transparency, this opportunity is often confounded. Unlike human decision-making, the exclusion of data from consideration can be guaranteed in the algorithmic context. However, forbidding inputs alone does not assure equal pricing and can even increase pricing disparities between protected groups. Moreover, the complexity of machine learning pricing limits the ability to scrutinize the process that led to a pricing rule, frustrating legal efforts to examine the ‘conduct’ that led to disparity.

On the other hand, the reproducibility of automated prices creates new possibilities for more meaningful analysis of pricing outcomes. The observability of decision rules expands the opportunities for controlled and preemptive testing of pricing practices. The analysis of the outcome becomes attractive in the context of algorithmic decision-making given the limitations of an analysis of the input and decision process stage. Moreover, outcome analysis in this new context is not limited to actual prices paid by consumers as we are able to observe the decision rule for future prices, allowing for forward-looking analysis of decision rules. This type of analysis is especially useful for regulators that enforce antidiscrimination law.

Watch the OBLB interview with Talia Gillis to learn more about this project: