Faculty of law blogs / UNIVERSITY OF OXFORD

Social Platforms and Their User Terms: Complex, Long, and Rather Adversarial


Tim R Samples
Associate Professor of Legal Studies at the Terry College of Business at the University of Georgia
Katherine Ireland
Interim Head of the DigiLab at the University of Georgia


Time to read

4 Minutes

Life as a modern consumer is a relentless parade of standardized contracts. Almost all our activity online is governed by dense, lengthy texts such as terms-of-use (TOUs) and privacy policies. Whether we read them or not, they play a pivotal role in digital governance.

Social platforms mediate important digital rights for much of humanity—not to mention vast amounts of data and online behavior. Their TOUs are the most widespread contracts in human history. The terms at Facebook alone apply to almost 3 billion users; TikTok’s terms apply to over a billion.

Yet, almost no one even glances at those texts. Most of us (correctly) assume that reading them is difficult and time-consuming.

A deep tension has emerged. While popular culture mocks the idea of reading TOUs, contract law assumes something very different. In effect, TOUs are almost always binding despite how absurd reading them might seem.

Our research (with co-author Caroline Kraczon, forthcoming in Berkeley Technology Law Journal) examines that tension with a variety of new methods and approaches to the data.

Data & Methods

Our study is interdisciplinary: we combine legal analysis with natural language processing and corpus linguistics.

First, we built a corpus. We selected seventy-five social platforms and scraped their TOUs, privacy policies, and community guidelines. Our corpus includes 195 separate texts with 944,459 words: 75 TOUs (504,025 words), 74 privacy policies (325,793 words), and 47 community guidelines (114,641 words).

The Corpus


Then we collected metadata about legal tendencies in the TOUs themselves such as governing law, dispute resolution, and modification rights. Finally, we ran a series of computational tests to measure the linguistic characteristics of our corpus.

Key to our study is the use of multiple corpora. That is, we ran our analysis on our social platform corpus as well as the Brown Corpus of American English[1] and a corpus of Jane Austen’s Collected Works[2]. The Brown Corpus represents a broad spectrum of modern English whereas the Austen Corpus offers a contrast with literary prose. These comparisons lend depth and context to our findings.

Our results emphasize several points about platform-consumer contracts: (1) in comparison with other genres of English, they feature extremely complex linguistic structures; (2) they are very, very long; (3) and they contain highly unilateral legal terms. Finally, (4) they are becoming more like themselves: they appear to have grown significantly more difficult, longer, and more unilateral in recent years.


Across a variety of metrics, TOUs register as extremely complex. Figure 1 shows a selection of our Flesch Reading Ease (FRE) results. The higher the FRE score, the more readable’ the text. FRE scores above 60 are generally considered to meet ‘Plain English’ standards. For instance, Readers Digest scores around 65 whereas Time magazine scores about 52. The TOUs in our dataset score substantially lower—averaging just over 30.

Figure 1: FRE, A Traditional 'Readability' Metric

Figure 1: FRE, A Traditional “Readability” Metric

Note: This figure shows that privacy policies, TOUs, and arbitration clauses register exceptionally difficult FRE scores. As additional points of reference, we include FRE scores for a handful of individual TOUs, such as Snapchat and Tinder.

Because traditional readability formulas like FRE have major limitations, we also tested the complexity of verb and noun structures in our corpus. Despite their billing as ‘readability’ tests, traditional formulas like FRE are limited by their narrow scope of inputs (essentially, sentence length and word length). We view traditional readability tests like FRE as helpful points of reference but insufficient as standalone readability metrics.

Figure 2 shows the complexity of noun structures. We use Python and the TAASSC tool to produce these results.

Figure 2: Noun-Phrase Complexity

Figure 2: Noun-Phrase Complexity

Note: This figure shows the results of TAASSC’s NP elaboration, a composite score of nineteen noun phrase types and embedding indices. Higher scores indicate more complex syntactic structures.

These results indicate that TOUs contain highly complex noun structures—in particular, the use of embedding in noun phrases. Embedding is a crucial factor in linguistic complexity and reading difficulty.

The next level of our syntactic analysis is verb structures. We use R and Fichtner’s C to produce these results. A key insight of the Fichtner’s C index is that the density of lexical verbs drives the syntactic complexity of a text. Our results show that TOUs feature highly complex verb structures.

Figure 3: Verb Complexity

Figure 3: Verb Complexity

Note: This figure illustrates our results for the Fichtner’s C index. Higher scores indicate greater complexity in verb structures. On this metric, arbitration clauses score even higher than the TOU average.

Longitudinal Trends

TOUs and privacy policies are long—and getting longer. And why not? Digital mediums have no natural constraints on length. Printing costs are a non-factor. Why not slather on another few hundred words?

Our comparisons with previous studies suggest that TOUs and privacy policies have grown substantially longer in recent years. The length of these texts imposes significant opportunity costs on users who decide to read them.

Table 1: Word Counts of TOUs and Privacy Policies (PPs)

Table 1: Word Counts of TOUs and Privacy Policies (PPs)


Our results also suggest that consumer contracts are longer, more complex, and more unilateral than ever. At the same time, certain platform-consumer TOUs function as large-scale instruments of digital governance. In these conditions, contract law and consumer reality are growing even further apart.

The ‘notice and choice’ model of consumer privacy revolves around disclosure. In the digital environment, consumers are inundated with legal texts. Our results show how platform TOUs and privacy policies contribute to that deluge. In the current environment, notice and choice are barely recognizable—perhaps mere fictions.

At a broader level, these conclusions underscore doubts about the efficacy of disclosure-based regimes for consumer protection.


Tim R Samples is an Associate Professor of the Legal Studies Program at the University of Georgia.

Katherine Ireland is the Interim Head of the DigiLab at the University of Georgia.

[1] The Brown University Standard Corpus of Present-Day American English, commonly referred to as the Brown Corpus, represents a broad spectrum of modern English and was designed for comparative studies like ours. It was compiled by Nelson Francis and Henry Kučera.

[2] The Jane Austen corpus represents a stark contrast with the other two corpora. Compiled by Project Gutenberg, it is composed of six novels written by Jane Austen: Mansfield Park, Sense and Sensibility, Emma, Pride and Prejudice, Northanger Abbey, and Persuasion (about 854,000 words in total).



With the support of