Faculty of law blogs / UNIVERSITY OF OXFORD

Law and Autonomous Systems Series: Paving the Way for Legal Artificial Intelligence – A Common Dataset for Case Outcome Predictions


Felix Steffek
Professor of Law and J M Keynes Fellow at the University of Cambridge
Ludwig Bull


Time to read

9 Minutes

This article provides an overview of our current project, which aims to contribute to artificial intelligence (AI) research in law. We are preparing a standardised dataset of 100,000 US court cases to test AI approaches for analysing court decisions and predicting case outcomes. We will make this dataset publicly available with the aim of creating a community around this research. Applications based on the dataset can provide new insights for private actors (such as parties and their advisors), for dispute resolution institutions (such as courts and ombudspersons), for lawmakers (such as parliaments and regulators) and for researchers.

This article deals with three issues: What are the characteristics of the dataset, and what is the process of its creation? What are the benefits of our dataset for the research and application of AI in law? How can private actors, institutions, lawmakers and researchers put the dataset to good use?

The dataset and its creation

The cases in the database are decisions handed down by various US federal courts. The judgments cover all areas of law that are litigated in the US federal courts of appeal, including federal criminal law, constitutional law, intellectual property law, bankruptcy law and securities law. The features of the judgments identified in the database are: (1) court (e.g. Court of Appeal for the First Circuit), (2) district of origin, (3) appellant, (4) respondent, (5) decision of the court (e.g. affirm or reverse), (6) judges, (7) cases cited in the judgment, (8) year of the decision, (9) month of the decision, (10) name of the case, (11) facts of the case, (12) opinions given by judges, (13) text of the judgment, and (14) a unique database-specific ID-number for each case. This structure facilitates the quantitative analysis of a bulk of judgments. The purpose of this database is not for lawyers to read judgements, but for scientists to apply machine-learning programmes. To the best of our knowledge, the providers of legal data currently do not offer judgments structured in a way to allow such applications.

The court judgments were downloaded from public databases that are accessible via the internet. The main challenge in creating the dataset is to find a way to extract specific pieces of information from the text of each judgment. For example, it is necessary to reliably identify the outcome of each of the 100,000 cases. Custom software was built to automate this process because it would not have been feasible to manually read and classify all judgments. Automatic extraction is challenging because the formatting and the terminology used in the judgments varies greatly by time and author. Some judges describe the outcome as ‘decision affirmed’ while others prefer to use the phrase ‘judgment in favour of the defendant’.

It is not possible to identify the outcome of all cases using one simple rule, as the variation in the data is too large. However, it is possible to identify the outcomes of a small percentage of the cases very accurately using a combination of simple rules. For example, it is possible to check how often the word ‘reversed’ appears in the last 10% of the text of the judgment (assuming there was no dissent) and to also record the number of times the word ‘affirmed’ appears. If the word ‘reversed’ appears at least three times and the word ‘affirmed’ does not appear once, it is safe to assume that the judgment reverses the lower court’s decision. This particular ‘simple rule’ is only one of many simple rules that together exclude with adequate accuracy the possibility of misclassification and account for polysemy.

Although these rules would not allow us to determine the outcome of all cases with sufficient accuracy, this procedure identifies with near perfect accuracy a small number of cases that are definitely to be classified as ‘affirmed’ or ‘reversed’. At the first stage, these cases become the training data for an AI algorithm (we are using a long short-term memory (LSTM) neural network trained with backpropagation) that learns what ‘affirming’ and ‘reversing’ judgements look like in general. At the next stage, the neural network then classifies the remaining cases according to the rules learned from the training data.

In order to ‘understand’ the general characteristics of ‘affirming’ or ‘reversing’ judgements, the algorithm ‘learns’ from its mistakes during the training process. In this process, the algorithm is presented with the complete text of an individual judgment and is asked to predict whether the judges affirmed or reversed the decision of the preceding court. This prediction is then compared with the result. If the predicted outcome and the real outcome do not match, the system learns where it went wrong and updates itself to make a better prediction in the future. This process is repeated for all cases in the training data. The AI algorithm improves step by step. These trained AI algorithms are then tested and currently determine the outcome of all 100,000 cases in the database with an accuracy of 97%.

This procedure has been repeated for all of the case features extracted so far. Further attributes such as area of law, or names of the lawyers for each party, will be extracted in the future. We are also interested in extracting higher-level information, for example, whether a judgement supports a particular public policy such as a bias for rescue and against liquidation in corporate bankruptcy cases.

Expected benefits

Why do we expect this common dataset to be a worthwhile project? Standardised datasets are already used in areas of AI research other than law. For example, CIFAR-10 and ImageNet provide datasets to compare progress in the areas of object detection and image classification. The Stanford Sentiment Treebank offers a dataset of movie reviews for AI analysis. Beyond AI, structured datasets have been used successfully in law-related research for some time. A good example is the UCLA-LoPucki Bankruptcy Research Database offering data concerning the insolvency of large public companies.

We will offer our dataset free of charge to download and use for everyone who is interested, be it researchers, state institutions or private enterprise. In essence, our database project provides four services to its users: (1) choosing useful cases, (2) sourcing the relevant data, (3) structuring the data in such a way that AI applications can easily be tested without further work being necessary, and (4) providing easy access to the data.

We think that the case for offering a common dataset is strong; and we would like to mention just a few selected reasons here. Researchers and developers in legal AI will be in a position to systematically approach various problems related to court decisions because they can compare and contrast each other’s approaches to find the best technical or theoretical solution to the problem. In addition, a common dataset reduces the cost of access to structured data. Only those engaged in gathering the data and setting up the database need to invest time and other resources. Further, a common dataset encourages a ‘spirit of competition’ for researchers to form a community around the challenges raised by court decisions. The dataset facilitates the comparison of approaches, the identification of the best solution and a collaborative effort to improve the state-of-the-art of AI applications to law.

Beyond using the dataset to compare approaches and applications, the creation and structure of the dataset itself can become the subject of academic discussion. Here, the focus is on the qualities that representative datasets and the processes creating them should have. To facilitate such a discussion, the sources and rules employed in the construction of the dataset need to be transparent.

Applications based on the dataset

Finally, we would like to mention possible applications that could be built around the dataset. A core application of the dataset will be for the prediction of case outcomes. In this context, the dataset can be used to identify the factors that determine the outcomes of litigation. This includes factual and legal factors as well as meta-factors such as lawyer characteristics and the preferences of decision-makers. The predictive power of legal AI is interesting from a number of perspectives.

Private actors: From the perspective of private actors, AI applications can improve the prediction of results of dispute resolution mechanisms. Better information on future outcomes helps parties and advisors to reduce the cost of dispute resolution. For example, if the expected value of litigation is lower than the expected cost, there is no economic incentive to litigate. AI-based knowledge of the likelihood of success in court vastly improves the accuracy of calculating the expected value of litigation. Further, if parties are in a better position to predict the outcome of contentious dispute resolution, they are also in a better position to negotiate amicable settlements.

Dispute resolution institutions (courts, arbitration providers, ombudspersons etc): Courts can use our dataset to develop applications supporting the analysis of their decision-making processes. Such applications may involve gathering data on the factors relevant to decision outcomes. Furthermore, comparisons between different areas of the law (court internal), but also between courts (court external) are possible. For example, the analysis can reveal whether courts apply the law in a consistent way.

AI tools could be used to suggest certain elements of a decision based on an analysis of the facts of the specific case at hand. Obviously, involving AI in the decision-making process raises sensitive issues of justice in terms of both results and procedure. Results might be affected by biases in the decisions the AI was trained with. The procedure might raise concerns as the decision-making process would no longer be entirely in the hands of humans.

The discussion to be had is whether satisfactory solutions to these concerns can be found. The parties and the courts might agree on the approaches the AI takes. For example, they might agree to exclude specific features in AI-assisted decision-making because it would be unjust to take that information into account. Race should not be a variable that influences the AI’s analysis. The parties and the courts might also agree to use only linear classifiers because it is easier to explain their consequences in individual cases. In UK courts, a consent-based approach is already taken for algorithms analysing documents in compliance and due diligence cases.

The frontier of using AI in dispute resolution is the quantitative assessment of legal problems without precedent or with unclear legal status. This raises the question of whether we can use machine learning to make sense of particular legal problems that are not easily intelligible via an approach that is based on the analysis of a large number of pre-existing similar cases.

Lawmakers (e.g. parliaments and regulators): Similar to the dispute resolution institutions, those involved in lawmaking can use AI applications based on our dataset to improve the legal rules for dispute resolution through a better understanding of the factors driving decisions. In addition, lawmakers could build AI approaches into the systems of dispute resolution. One example is using AI to provide suggestions of dispute resolution in low value cases as AI offers a benefit-cost ratio that cannot be matched by human decision-makers. This is not to say, of course, that such developments should be implemented uncritically (if at all). Instead, lawmakers need to critically engage with the possibilities and limits of AI in dispute resolution.

Researchers and educators: Last, but not least, we hope that researchers and educators will benefit in a plethora of ways from our dataset. AI can help at various stages of research: first, in structuring information; second, in analysing the structured information, and third, in predicting outcomes to generate new information.

Researchers could analyse cases using AI in ways similar to how statistical methods have been employed to analyse numerical data. The insight AI promises is to unearth the information based in text (as opposed to numbers). This creates new avenues for text-based research, which until now required massive time investments by researchers, e.g. by manually implementing case-based research methods. Increasing the number of cases analysed by AI compared with manual research done by humans will strengthen the reliability of results and the hypotheses that can be tested.

In addition, scientists might engage in exploring whether AI can be used to develop new or complex reasoning. Here, the question is whether we can design systems that will help (human) lawyers or judges to find new arguments that have not been formulated before. More recent research in AI has yielded systems that can generate completely new images or texts based on a large number of real images or texts. For example, these systems are able to generate images of human faces, even though these particular human faces do not exist in reality. Can we design similar systems that discover new legal arguments we have not yet considered? Might it even be possible for systems to inform our analysis of unprecedented cases in the future?


Our project introduces a fundamental building block of AI research in legal decision-making: a dataset of 100,000 court decisions. The dataset contains not only structured information on the decisions (for example facts and opinions), but also meta-data (including information on judges and the parties). The dataset was created by downloading judgments from public databases and leveraging neural networks to extract the relevant information from the text itself. This dataset is a prerequisite for the systematic exploration of legal AI, and, we hope, that it will inspire private actors, institutions, lawmakers as well as scientists and educators to build useful systems and investigate legal processes. Ultimately, these tools may help to better understand the logic and nature of law itself. Our dataset is the largest dataset of court cases and accompanying meta-data that is freely available in the world. We hope that this dataset will advance research by providing a benchmark for future legal AI systems.

Ludwig Bull is the Scientific Director at CaseCrunch, an AI startup specializing in legal decision predictions.

Felix Steffek is a University Lecturer at the Faculty of Law and a Senior Member of Newnham College, University of Cambridge.



With the support of