You may be part of a data supply chain even if you are unaware of it.
As a social media user, you may be a supplier of data when a platform collects information about your behavior and either sells that data to other companies or retains it for its own commercial purposes. As a company, you may believe you are simply purchasing data from a single counterparty—without realizing that your supplier itself relies on multiple upstream sources, each with its own legal, contractual, or geopolitical risks. In both cases, suppliers and buyers are embedded in a data supply chain, whether or not they recognize it.
This lack of visibility matters. Data is a critical resource for companies seeking to develop new goods and services and to ensure that existing products continue to function. Many of the next generation technologies that firms and governments anticipate—particularly in artificial intelligence, automation, and digital infrastructure—depend on large volumes of data in much the same way that earlier industrial technologies depended on oil, steel, or electricity.
Just as firms must secure reliable access to semiconductors and other physical inputs, they must now secure reliable access to data. Yet unlike physical supply chains, data acquisition is rarely conceptualized as a supply chain problem. This conceptual gap leaves firms, regulators, and policymakers ill-equipped to understand the dependencies and vulnerabilities that increasingly shape modern economic activity.
Data as a Critical Resource
AI systems, recommendation engines, logistics platforms, financial technologies, and many consumer-facing digital products do not run solely on software or hardware—they run on continuous streams of data. Interruptions in data access, changes in data legality, or the withdrawal of upstream suppliers can impair or even disable downstream products—just as disruptions in physical supply chains can halt manufacturing.
Despite these risks, both academic and policy attention to data governance focuses overwhelmingly on downstream uses of data, such as privacy or consumer protection, among other issues. These are important concerns, but they address harms only after data has been acquired by the company. This attention does little to illuminate how data is sourced upstream and the risks associated with different acquisition methods.
Five Methods of Data Acquisition
Companies obtain data in five distinct ways: build, buy, scrape, surveil, and generate. In practice, most firms rely on multiple methods simultaneously, often without treating them as part of an integrated system.
- Building Data
A ‘build-it’ model involves a company using its own technology to acquire data that it then incorporates into its goods and services.
2. Buying Data
A ‘buy-it’ model involves a company purchasing data from another company, such as a data broker. The data purchaser may even opt to purchase or merge with the other company itself in order to acquire/harvest its data.
3. Scraping Data
A ‘scrape-it’ model involves companies ‘scraping’ data from the internet. A scraper develops a computer program that directly extracts, copies, and aggregates content from the source code of various websites for later use or sale.
4. Surveillance-Based Data Collection
A ‘surveil-it’ model generally employs physical devices to collect information on individual users, such as through smart home devices, watches, apps, and other technologies. Users use this technology to ‘self-surveil’ so that they can track personal goals—such as miles run or calories consumed—but this data is also collected by the companies that sell these devices. Moreover, even internet service providers (ISPs) may collect a surprising amount of data from individual homes and bodies unbeknownst to the users.
5. Generating Synthetic or Derived Data
Finally, a ‘generate-it’ model relies on generative AI outputs as data inputs for other software uses. Alternatively referred to as synthetic data, or ‘AI slop’ it is what is produced when users use generative AI to produce outputs, such as text, images, and video, that are then turned into inputs for future AI applications.
Each of these acquisition methods carries its own risks. The more important—and often overlooked—problem arises when companies combine them.
For example, a company may believe it can easily replace one data supplier—until it discovers that multiple vendors rely on the same upstream source, or that legal changes affect an entire category of data acquisition simultaneously. We are familiar with these challenges in the supply chain risks in physical goods, but they are far less visible in the data context.
Why the Supply Chain Lens Matters
Identifying data acquisition as a supply chain has important implications for corporate governance. Identifying data acquisition practices as a data supply chain means that data can be managed, governed, and regulated like supply chains for physical goods. Corporate management should understand the structure of their data supply chains, evaluate their data suppliers, identify critical risks to their data supply chains, and invest in resiliency capabilities. This is a priority for corporate boards and senior management and should be distinguished from the management of cyber risks.
- First, recognizing data supply chains reframes data risk as a structural dependency problem, not merely a compliance issue. Risks associated with data sourcing often emerge long before any downstream harm occurs, and they may persist even when firms comply with existing regulations.
- Second, it helps to identify new priorities for corporate boards and senior management. They are well-accustomed to overseeing supply chain risk in physical inputs and they should do the same for data supply chains. Extending this oversight to data supply chains requires mapping upstream supply chains, identifying and assessing risks, and investing in data supply chain resilience.
- Third, it distinguishes data supply chain risk from cybersecurity risk. While cybersecurity focuses on protecting data once it is held by the company, data supply chain governance focuses on how data enters the company in the first place. The two are distinct and governance of the former does very little to address the latter.
For policymakers, data supply chains are vital to national security policy because they are the foundation for critical digital infrastructure. Increasingly, society and the modern economy rely upon digital infrastructure such as the internet, SMTP, information platforms, digital maps, and digital payment networks, among others, to facilitate core arenas of economic, social, and political life. As society increasingly depends on them, these technologies transform from an emerging technology, rarely understood, to a daily technology, frequently taken for granted. The reality is that the vitality of strategic digital infrastructure depends on the resiliency and efficiency of data supply chains managed by private corporations. A disruption in those chains threatens many aspects of digital economic life that society assumes will always be available to them—suddenly rendered inaccessible.
But most digital infrastructure is built and operated by private corporations. To mitigate these risks, policymakers should identify the types of critical technology and critical infrastructure that is supported by data supply chains and incentivize the private sector to ensure that these supply chains are resilient. That would require that the companies developing these technologies and providing digital infrastructure identify the structure of their data supply chains (including suppliers and sub-suppliers), categorize the different types of risks that may endanger their supply chains and take steps to prevent, mitigate or address these risks, and invest in supply chain resiliency capabilities. A failure to do so not only impairs the financial success of the companies that control these supply chains, it also compromises the security of the nation.
The framework of a data supply chain also offers policymakers concerned about privacy invasions potential regulatory strategies at the data supply chain level, that can mitigate the social harms associated with data acquisition. Applying supply chain thinking to data does not require abandoning existing privacy or AI governance frameworks. Rather, it complements them by addressing harms at their source. It also provides regulators with tools—such as mapping, disclosure, and due diligence obligations—that are already familiar tools used to address social harms in physical supply chains.
Seeing data supply chains for what they are is a first step toward governing them more effectively. It is also a necessary step for companies and policymakers seeking to manage the risks—and realize the benefits—of an increasingly data-dependent economy.
The full paper can be accessed here.
Carla L Reyes is an Associate Professor of Law at SMU Dedman School of Law.
Kish Parella is the James P Morefield Professor of Law at the Washington and Lee University School of Law.
OBLB types:
Jurisdiction:
Share: