Growing capabilities of algorithms and AI systems have led to their increasing use in critical economic and legal decisions. Companies and public institutions now employ automated decision support for various tasks, such as job application assessments, salary considerations, and even bail and parole decisions (eg, Kleinberg et al., 2018). This increasing reliance on automated decision supports and a growing number of their potential applications has prompted calls for policies that put a ‘human in the loop’ to maintain human agency and accountability, provide legal safeguards, or perform quality control. Responding to this need for regulation, the notion that humans should monitor algorithms is integrated into some regulations (eg, the European Union's GDPR Article 22) and proposals for regulation (eg, the European Union’s AI Act proposal Article 14). In an ideal world, the monitor is supposed to intervene when the algorithm makes erroneous or discriminatory decisions—but not when the decisions are correct and fair.

The question of how people interact with and incorporate algorithmic recommendations has been a focus of scientific attention. Existing behavioral research raises concerns about the smooth interaction between human monitors and automated recommendation systems: people's responses to automated decision support vary, with some exhibiting aversion to algorithms and others showing appreciation. Users tend to underutilize algorithmic recommendations, failing to incorporate them into their decisions, and at times, they over-rely on them without adequate correction of mistakes (for an overview see Chugunova and Sele, 2022). Despite the significance of human agency in interacting with automated decision support systems for legal and policy discussions, the current empirical evidence offers limited guidance on its role.

We conducted an experiment to address two key questions (Sele and Chugunova, 2022). First, we explored the impact of transitioning from fully automated decision-making to a 'Human-in-the-Loop' system on people's preference for algorithmic decision support. Second, we investigated whether human monitoring influences the accuracy of decisions.

In the experiment conducted in Fall 2021, 292 students from the ETH, a leading technical university in Europe, predicted others’ performance in a math test based on short personal profiles. They had the option to receive decision support in the form of estimates from either human participants or a statistical model. The estimates from human participants originated from a pre-study where a sample of US residents completed the task. The statistical model was developed by Dietvorst et al. (2018) using data from the High School Longitudinal Study of 2009. From the pre-study, we picked the performance estimates to match closely those of the model. That is, regardless of the source of the recommendation, for each profile the recommendation they could receive was almost identical. The participants were not informed about it but were informed that both sources of decision support performed equally well on average and made an average mistake of 15 to 20 percentiles. 66% of the time participants preferred to delegate the decision to an algorithm over an equally accurate human when they delegated the decision completely and could not adjust the recommendation they received. Allowing participants to adjust recommendations further increased preference for an algorithm by 7 percentage points, suggesting human oversight promotes its use. Participants in the condition with oversight also reported greater confidence in their predictions.

Regarding engagement with recommendations, in line with automation bias, participants were less likely to adjust algorithmic recommendations despite them being almost identical regardless of the source. In our experimental environment, we found that their adjustments decreased the accuracy of the final estimations. Within the Human-in-the-Loop condition, participants appear to struggle to adjust the recommendations that stem from an algorithm. Predictions submitted following the algorithmic recommendation appear to be (insignificantly) less accurate. Given that the retention of a human in the loop aims to ensure the quality of final decisions, it is probably most problematic that the human monitors were less likely to adjust recommendations that contain larger errors as compared to smaller ones regardless of the source, thus failing to serve as an ‘emergency break.’ Moreover, the adjustments made to recommendations with larger errors also tended to be smaller.

Our findings reveal a trade-off: while human oversight boosts the use of automated decision supports, it may compromise decision accuracy. The trade-off between the two components is likely to be context-specific: For example, how reliable is the automated decision support, how much expertise does the human monitor have, or if they have access and can reasonably process the inputs that the algorithm based its recommendations on. However, the evidence from our study intends to highlight that keeping a human in the loop is not a silver bullet, and it is critical to design 'Human-in-the-Loop' systems carefully, considering the balance between adoption and precision.

Daniela Sele is a Research Affiliate at ETH Zurich & a Diplomat at the Liechtenstein Mission to the EU.

Marina Chugunova is a Senior Research Fellow at the Max Planck Institute for Innovation and Competition.

The post summarises the results of the study: Sele, Daniela and Chugunova, Marina, Putting a Human in the Loop: Increasing Uptake, but Decreasing Accuracy of Automated Decision-Making (November 18, 2022). Max Planck Institute for Innovation & Competition Research Paper No. 22-20.

The authors’ full article can be found here.

References

Chugunova, Marina and Daniela Sele (2022). ‘An interdisciplinary review of the experimental evidence on how humans interact with machines.’ In: Journal of Behavioral and Experimental Economics, p. 101897.

Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan (2018). ‘Human decisions and machine predictions.’ In: The Quarterly Journal of Economics 133.1, pp. 237–293.

Putting a Human in the Loop: Increasing Uptake, but Decreasing Accuracy of Automated Decision-Making

Regulating AI in Finance: Putting the Human in the Loop

With the support of