Artificial intelligence (AI) technologies are increasingly found to outperform humans in areas such as investment decisions, medical diagnosis, and even creativity. The drive to integrate these technologies into society stems not only from a desire to enhance efficiency but also from a commitment to upholding the highest standards of care. Where AI technologies outperform human judgment, the question arises whether the standard of care should require the use of AI.

At the ‘Intersections of Private Law’ conference at the University of Sydney, I argued that mere statistical superiority is insufficient for English tort law to impose a duty to use and follow analytical AI-enabled devices. This means that failure to use such a device or follow AI-generated results does not necessarily constitute a breach of duty of care. Even if the device demonstrated greater accuracy (‘hit rate’ of a model) or lower average error size than traditional methods or human experts in test situations, a plaintiff may not be able to establish the unreasonableness of the defendant’s actions due to knowledge limitations—thus failing to bring a successful negligence claim (I.). The limitations of knowledge and resources can be addressed through AI transparency (II.) and through ‘process-based’ statutory or institutional guidance (III.).

I. Problem: Negligence Calculus

While AI-involving negligence claims can arise in various professional settings, including lawyers using litigation risk analysis software or investment advisors using a financial analysis tool, AI is particularly advanced in medical diagnostics. Consider a patient who contends that their clinician should have used an AI-enabled device to spot a (missed) early-stage cancer; or that their clinician should have followed the device’s diagnosis rather than overruling it. The onus is on the patient to prove that the clinician breached the duty of care. The standard of care is based on a number of factors such as the size of the risk, the gravity of harm, the cost of taking precautions, and the utility of the defendant’s conduct (‘negligence calculus’). Assessing the (marginal) financial, technical, and operational affordability of an AI application–including the costs of training, integration into existing workflows, and compliance with regulatory requirements–poses a significant challenge. Even more so, anticipating and recognizing the potential risks in using an AI-enabled device is uniquely demanding for two main reasons.

First, it will be difficult for AI users to assess the absolute size of the risk, namely, the probability of the device making an incorrect prediction. Given the inherent opacity and complexity of AI decisions, the models do not explain their predictions, in contrast to a human. The user will have statistical information on the device’s performance during its training and validation phases, yet there are currently no concrete standards for training and test data or domain-specific benchmarks. The clinician may not know the quality and variety of this data, or whether their clinical practice reflects the test data. There could be systematic biases in the model that are prevalent in the clinician’s patients, or the model may not fully account for the interdependencies and influence of a combination of factors that medical conditions have.

Second, it will be difficult to assess the marginal size of the risk. Unlike in drug risk assessment (where the risks of side effects may also vary between patients), clinicians will need to compare the performance of the device against their own judgement. This is where AI systems differ from traditional technologies that don’t (attempt to) mimic cognitive processes. Given that each clinician has unique skills and experiences, this individualized benchmarking is a daunting task.

The first point, namely the challenges in translating the probabilistic evaluation of AI performance to individual patient cases, can be underscored by a long-standing debate in evidence theory about the mismatch between mathematical probability and adjudicative fact-finding. In XYZ Ltd v Schering Health Care Ltd [2002] EWHC 1420, the causation issue was whether the plaintiff’s ingestion of oral contraceptive pills manufactured by the defendant was a cause of the plaintiff’s cardiovascular injuries. Because the ‘but for’-test for causation could not be effectively applied due to multiple contributing factors, the court applied the ‘doubling of the risk’-test to determine whether there was a substantial increase in risk. The test, which can only be applied with complete and sound data (Sienkiewicz v Greif Ltd [2011] UKSC 10), is satisfied if the defendant’s negligence doubles the plaintiff’s risk of harm.

Evidence scholars, however, consider judicial fact-finding to be fundamentally incompatible with mathematical probability. That is not just because it undermines autonomy and individuality by denying the defendant’s ability to diverge from their peers. As noted above, observed associations may be merely coincidental or due to confounding factors. Therefore, statistics must be supplemented with further information to reach a meaningful conclusion. To establish epidemiologic evidence of a causal relationship, courts have referred approvingly to the Bradford Hill criteria, such as ‘plausibility’ and ‘coherence’ (Reay v British Nuclear Fuels Plc [1994] 5 Med LR 01, Ministry of Defence v Wood [2011] EWCA Civ 792).

II. Strategy: AI Transparency

Some lessons can be drawn from this for the use of AI, not just in proving factual causation. How courts and legal scholars have dealt with reliance on probabilities in causation underlines the need for explanatory rationales that will enable users to determine the ‘size of risk’ in using an AI-enabled device, i.e., to determine the probability that its use will cause harm. Even in cases where a device statistically outperforms humans, a duty to use an AI-enabled device or to follow an AI-generated output can usually only be established if there is additional information (‘transparency’).

A duty to use an AI-enabled device requires ‘pre-deployment transparency’. This includes providing information about the data sources used to train and validate the system and the algorithmic design, communicating potential risks and limitations of the model, and addressing ethical aspects such as privacy. Technical aids that contribute to model transparency are permutation or tree-based feature importance analysis, partial dependence plots, or rule extraction methods.
A duty to follow an AI-generated output will depend on ‘post-deployment interpretability’. This refers to the ability to retrospectively explain an AI system’s decision-making process, allowing users to verify and validate how the AI arrived at a particular result. ‘Explainable AI’ techniques include individual conditional expectation (ICE) plots, counterfactual explanations, saliency maps, and SHapley Additive exPlanations (SHAP) values.

The information should be easily accessible in order to avoid overwhelming the AI user; otherwise, the ‘cost of taking precautions’ could prevent AI use from being required under the applicable standard of care. On a positive note, this transparency not only assists AI users in making informed decisions but also enables courts to establish factual causation without relying solely on statistical evidence.

III. Strategy: ‘Process-based’ Statutory or Institutional Guidance

Additionally, the limitations of knowledge and resources can be addressed by ‘process-based’ statutory or institutional guidance. Violation of duty in law, such as directions set out under (future) UK equivalent(s) of the EU’s AI Act, may imply a breach of duty if the provision is intended to protect the plaintiff (‘statutory negligence’). Recommendations from public authorities will also become increasingly influential, particularly in high-risk environments such as medical diagnostics. At least in medicine, guidelines from bodies such as the National Institute for Health and Care Excellence (NICE) set out a standard of care, and NICE has already been testing AI-enabled devices. As stated in Bolam v Friern Hospital Management Committee [1957] 1 WLR 583, a clinician will not be considered negligent if they acted in accordance with a practice that is accepted as proper by a responsible body of medical opinion. This approach shifts some of the decision-making burden from the clinician to a body that has the resources to make an assured decision about when devices are safe enough to be required under the applicable standard of care. Such decisions are more robust, more defensible, and more likely to be accepted by the public.

Amelie Sophie Berz is a DPhil Candidate at Worcester College and Stipendiary Lecturer in Law at St John's College, University of Oxford.

From Optional to Obligatory: Why AI’s Statistical Superiority Doesn’t Dictate Tort Law Duties

AI Liability After the AILD Withdrawal: Why EU Law Still Matters?

Trump’s Digital Asset Reclassification: Legal and Economic Implications

With the support of