Faculty of law blogs / UNIVERSITY OF OXFORD

Big data and algorithms: Focusing the discussion


Jay Modrall


Time to read

9 Minutes

The explosion in the collection of ‘big data’ and the use of algorithms for pricing across many industries has generated intense discussion in recent years. Broadly speaking, there are two main areas of concern: 

  • First, that the use of algorithms to process big data may play a role in restrictive agreements, decisions or concerted practices or otherwise facilitate collusion; and 
  • Second, that the collection of big datasets that are valuable, unique and non-replicable may create barriers to entry and market power for the data holder.

The first category, the main focus of an OECD roundtable in June 2017 and report published in September 2017, are commonly referred to as ‘algorithmic collusion’ issues, while the second, the main focus of a Franco/German study published in May 2016 and an OECD roundtable in November 2016, are commonly referred to as ‘big data’ issues. These issues have also been discussed in countless articles and conferences, and notably in two books, ‘Virtual Competition’ (Ariel Ezrachi and Maurice E. Stucke 2016) and ‘Big Data and Competition Policy’ (Maurice E. Stucke and Allen P. Grunes 2016). 

The ‘big data’/’algorithmic collusion’ dichotomy is misleading, however, because it conceals preconceptions about the nature of big data and what companies do with them. In general, ‘algorithmic collusion’ concerns presuppose that competitors have access to substantially identical data and can use them in express or tacit collusion strategies. ‘Big data’ concerns presuppose that companies hoard data to which other companies need access to compete successfully. 

This post aims to focus the debate by making the link between the types and uses of big data and potential theories of harm. A better understanding of contexts in which ‘big data’ and ‘algorithmic’ collusion are more likely to arise is especially important in view of regulatory initiatives underway in the EU and elsewhere, which risk taking an overbroad approach with unintended consequences. These issues are developed in more detail in my June 2017 article in Competition Policy International. 

Big data characteristics and uses

Although big data are commonly defined by reference to the ‘four Vs’ (volume, velocity, variety and veracity), the subject of the information, how it is gathered, and characteristics affecting the value of the data for particular purposes may be more relevant for antitrust purposes. Big data discussions tend to assume that big data relate to individuals’ online activities, but big data often relate to objects, particularly in the Internet-of-things context. Concerns also typically assume that big data are collected on customers or other third parties (‘third-party data’), but companies often collect big data on their own products or assets, such as car or truck fleets (‘first-party data’). 

Four other characteristics often cited in discussions of big data are (i) the ‘non-rivalrous’ nature of data (ie, the fact that an unlimited number of entities may in principle hold the same data); (ii) the ubiquity of data (ie, the fact that the same or similar data may be available from multiple sources); (iii) the decreasing marginal value of additional data (ie, the fact that beyond a certain volume, additional data may add less incremental value); and (iv) the tendency for data’s value to decline rapidly over time. However, the relevance of these characteristics is highly fact specific. 

For instance, while data are technically ‘non-rivalrous’ in the sense that one company’s ‘ownership’ of a particular data point does not preclude another company from owning the same data point, in practice many datasets are proprietary and cannot be replicated. But there is nothing inherently anti-competitive about proprietary datasets; indeed, companies normally do have proprietary datasets on their own products and services. Further, while similar consumer data may be available from multiple sources, big data are not inherently ‘ubiquitous’; companies’ proprietary data on their own products and assets is not available from multiple sources, but that does not mean they are anti-competitive. 

Similarly, claims about the diminishing incremental value of big data and the decline of big data’s value over time depend on the nature of the data and the purpose for which they are used. For example, if a company uses data on the movements of potential customers to ‘now-cast’ promotions as people approach a retail outlet, each new data point may have equal value (no diminishing incremental value), but that value will be very short-lived. On the other hand, the same location data collected for a different purpose, for instance an analysis of consumer foot traffic in particular locations, may have longer term value but exhibit diminishing incremental value over a certain sample size.

In sum, generalizations about big data giving rise to antitrust concerns need to be taken with a grain of salt. The antitrust-relevant characteristics of big data depend on the data’s subject and how they are collected and used. 

Big data, algorithms and theories of harm

As mentioned, antitrust authorities and commentators concerned about big data and algorithms make two broad assertions: (i) the collection and exploitation of data may increase the likelihood of express or tacit collusion by increasing market transparency and enabling high-frequency trading and (ii) the collection and exploitation of data may raise barriers to entry and be a source of market power.  These concerns are summarized briefly below.

Algorithms and collusion

The 2016 Franco/German study and the 2017 OECD report argue that greater information on competitor pricing may limit competition, for example by enhancing the stability of existing collusive arrangements or facilitating collusion when data are used to fix prices through the use of algorithms. In transparent markets, companies can more easily monitor each other’s actions, and frequent interactions enable them to punish deviations. High-frequency interactions allow firms to make price adjustments very quickly, allowing for an immediate retaliation to deviations from collusion. 

Most of the debate around algorithms and collusion concern express collusion, for example where algorithms are used to implement or monitor price fixing agreements. For instance, the UK CMA found that Trod Ltd and GB eye Ltd agreed not to undercut each other’s prices on posters and frames sold on Amazon’s UK website and used automated re-pricing software to monitor and adjust their prices, making sure that neither was undercutting the other.  While interesting, these concerns generally do not raise novel conceptual issues from an antitrust perspective (as opposed to problems of proof), since express collusion would be illegal with or without the use of algorithms. 

More difficult issues arise in relation to ‘tacit collusion,’ which normally falls outside antitrust law. Some have proposed revising the concept of agreement to incorporate ‘meetings of minds’ that are reached with the assistance of algorithms. So far, however, such proposals have not received broad support. 

As mentioned, concerns about the use of algorithms for illegal collusion do not apply equally to all types of big data. These concerns mainly relate to third-party data, especially consumer data, where competitors are more likely to have access to the same or similar data. These concerns also seem most relevant to markets where large volumes of pricing data are publicly available and products are relatively undifferentiated, in particular retail, consumer-oriented businesses. By contrast, the collection of first-party data on companies’ own assets, products and services would not be expected to give rise to collusion concerns. Similarly, concerns seem less likely to arise in business-to-business markets where price competition takes place through less transparent methods or where competitors are competing on factors other than price.  

In other words, the antitrust concerns raised about the use of algorithms to process big data are not equally applicable to all types of data, but are most relevant to third-party data that are non-rivalrous and ubiquitous, not to big data that are unique and non-replicable.

Big data and abuse of dominance 

The potential for big data to create market power or entry barriers when new entrants are unable either to collect or buy access to the same kind of data as established companies is a main focus of the 2016 Franco/German study. The study notes that the importance of big data in raising entry barriers depends on market structure and consumer practices, but does not discuss whether specific types of big data are more or less likely to create barriers to entry.   

In practice, however, whether big data can create barriers to entry depends to a significant extent on the nature and proposed use of the data. Where companies collect first-party data in the course of designing and testing their own products, those data may confer a competitive advantage; however, any barrier for other companies would arise not from the dataset as such, but from the need for those companies to invest in the collection and analysis of similar data on their own products to be competitive. Where companies collect data from others (for example, by tracking customers’ activities on online platforms), whether those data constitute a barrier to entry for competitors would depend on factors such as competitors’ ability to obtain substitutable data from alternative sources (are the data non-rivalrous and ubiquitous?) and the volume and time-sensitivity of such data for the intended purpose (do returns diminish with incremental data, and does value diminish over time?). 

The Franco/German study discusses a number of ways in which business practices in relation to big data could give rise to an abuse of dominance, including refusal to provide access and exclusive contracts. The Franco/German study suggests that a company’s refusal to provide access to data can be anticompetitive ‘if the data are an ‘essential facility’ to the activity of the undertaking asking for access.’ These requirements would be met ‘if it is demonstrated that the data owned by the incumbent is truly unique and that there is no possibility for the competitor to obtain the data that it needs to perform its services.’ 

But this statement is overly broad. For a duty to provide access to data to arise under EU law, the company holding the data would need to hold a dominant position, and the competitor seeking access to the data would have to require access to the data to develop a new product or service in another market for which there is potential consumer demand. In other words, EU law would not oblige a company to share its data to allow a competitor better to compete with it in the same market. Even if the company seeking access wanted to use data to offer a new product or service, it would have to show that it would be impossible or unreasonably difficult to develop the new product or service without access to the dominant company’s data and that a failure to do so would exclude all competition for the new product or service. The fact that the dominant company’s data are ‘unique’ would not suffice to create a duty to provide access. Again, many companies collect unique and valuable datasets on their own products. 

Similarly, the Franco/German study argues that exclusive agreements or networks of agreements involving data access may infringe antitrust laws if they prevent rivals from accessing data or foreclose rivals’ opportunities to procure similar data by making it harder for consumers to adopt their technologies or platforms. Following the approach of the Commission’s guidelines on its priorities in enforcing Article 102 TFEU in relation to exclusionary conduct, exclusive contracts to procure data could potentially give rise to concern where a contract or network of contracts had the potential to foreclose competitors’’ access to data required for them to compete effectively. The potential for individual exclusive contracts, or a network of exclusive contracts, to interfere with competitors’ access to data would need to be assessed in light of the substitutability of different types of data for the same purpose, and the potential sources for each type of data. Assessing the substitutability of first-party data may be challenging, because traditional economic tools may not apply. It will also be important to distinguish the need for data from the need for algorithms or other software to process such data, which may be developed in-house or acquired from vendors. 

In any case, when assessing the effects of such a contract or network of contracts, competition authorities would need to consider the potential foreclosure effects in light of the purpose for which the data are required. For instance, foreclosing access to data from one population of users may not affect competition to develop products where big data are important but the identity of the data provider is not, such as developing search algorithms, spell-check programs or voice-recognition software. Even in markets where the identity of users is important, such as individually targeted advertising services, the competitive effect of exclusivity in agreements providing for the collection of data from third parties would need to be assessed in light of the population of potential users, the availability of substitutable data from other sources (for instance as a result of multi-homing) and how quickly the value of the data in question diminishes.


In short, the ‘big data’/’algorithmic collusion’ dichotomy is misleading and potentially dangerous. Whether competition concerns from the use of algorithms to process big data varies depending on the characteristics of the data and how companies use them. A closer look at alleged theories of harm suggests that concerns would only arise with respect to limited categories of information and practices than broad-based comments might indicate. Overly broad discussions of big data and algorithmic collusion may also provoke overbroad regulatory responses. Regulators should take care that any proposed rules limiting the use of big data are carefully tailored to realistic concerns to avoid chilling innovation and other unintended consequences.

Jay Modrall is Partner at Norton Rose Fulbright. 


With the support of