Faculty of law blogs / UNIVERSITY OF OXFORD

Law and Autonomous Systems Series: Biometric data-matching risks and the rise of self-sovereign identity

Reporters are in the habit of looking at emerging technologies in isolation when describing their benefits, and that’s certainly the case with biometrics, blockchains, smart contracts, artificial intelligence, and the internet of things—some of the key enablers of autonomous systems. To some extent, technology advocates and the press have both been training readers to think in terms of the discrete capabilities of these technologies individually.

But attorneys have to deal with what happens once new technologies are blended together as a part of a broader mix, at scale, in systems operating in real-world situations. Some of the thorniest legal issues arise when one or more powerful technologies are combined with others and the combination gains broad adoption. What happens, for example, when networking, edge processing and storage becomes pervasive (as in an internet of things scenario), the cost of data processing drops to near zero as a result, and biometric data matching at the same time goes mainstream as a way to authenticate, authorize and grant access to users? For those who are rushing to capitalize on this sort of opportunity, privacy concerns won’t be top of mind. Will cooler heads prevail?

This post takes a look at biometric matching and related personal data management challenges, and compares and contrasts the different ways biometric data is used and stored for authentication, authorization and access control purposes. It considers the privacy implications of the different storage techniques in use, and how advanced identity management principles and an emerging set of personal data management techniques known as self-sovereign identity could facilitate compliance with regulations such as the EU General Data Protection Regulation (GDPR).

Uses of Biometric Matching

The convenience of biometric matching is undeniable. Today, a fingerprint scan or voiceprint identification automatically unlocks your mobile device. An iris scan allows a coffee shop to recognize you as a customer, automatically offer you a drink you might like, and charge your account for the items you select.

The need for some form of biometric matching is also undeniable. Whole segments of society are frequently on the move and crossing borders or entering controlled access areas daily. US Customs and Border Protection says it checked the credentials of some 1,069,266 individuals on a typical day in Fiscal Year 2016. For its part, the EU is facing the prospect of 300 million border crossings a year by 2025, one of the reasons behind EU Smart Borders, planned for implementation starting in 2020.

Given evident technological advances, hiring humans to check every passport or identity card visually makes less and less sense. As biometrics are unique to each individual and verifiable with a scan, biometric matching at the border or at airport security stations can reduce wait times and the resources dedicated to checking credentials substantially.

Types of Biometric Matching

Biometric matching types differ based on where the biometric data is stored:

  1. On-device biometric matching, in which the data remains on an individual user’s device. A typical example would be Apple’s iPhone, which authenticates the user and unlocks the phone with the help of biometric (fingerprint or voiceprint) data stored and encrypted on the phone itself.
  2. On-network biometric matching, in which the data is stored in encrypted form on a networked server. The US, for example, collects and stores the fingerprints of visitors entering the country. India, for its part, stores biometric data on all its citizens.

Associated Privacy Risks

In a 2016 white paper produced on behalf of authentication provider Nok Nok Labs on the topic, PwC Legal LLP compared on-device with on-server or on-network biometric matching and concluded that the risk of on-device matching is lower. Among the reasons listed are these:

  • Central databases used in on-network matching can store biometric data on billions of people. A single data breach can therefore compromise a huge number of records. 
  • On-device storage (of the kind used by the Apple iPhone, for example) only stores the data associated with an individual.
  • Numerous data breaches have been reported involving the loss of biometric data of millions of individuals in the US, Israel, and India, for example. In each case, the biometric data involved was stored on networked servers.

For reasons like these, the EU and Canada, the white paper points out, “strongly advise against the storage of biometric data on large databases.”

New Blockchain-Related Privacy Risks

Blockchains are shared ledgers (distributed transactional data stores that enforce and provide all users a view into one version of transaction truth) that store and encrypt transactions in a block-interlinked way that’s tamperproof. (That’s not to say all systems that have blockchains are entirely secure, as the account security management associated wallets used with blockchains, for example, has typical public key infrastructure (PKI) security vulnerabilities. More on that distinction later in this post.)

The GDPR includes in Article 17 the right to erasure, also known as the right to be forgotten, and in Article 20 the right to data portability. These are requirements blockchains cannot comply with, yet another reason to avoid storing biometric or other personal data directly on a blockchain. Blockchains are designed to store data immutably. In other words, each record is stored permanently. Any data stored once on a blockchain won’t be portable or erasable after that point. Any change to the record on a blockchain requires an amending record. The initial record stays locked in place.

In spite of the central database storage warnings from the EU and Canada, the personal data protection rights provisions in the GDPR, and the immutable nature of blockchain transactional recordkeeping, some companies are unthinkingly storing personal data on blockchains, assuming that their encrypted and tamperproof recordkeeping will protect the stored data. Some companies offering cryptocurrency, for example, are requiring on-network biometric matching to authenticate users. Others are even suggesting users store their DNA data on a blockchain. Still others are downplaying the warnings, saying they’re overstated.

The weaker blockchain-based systems are prone to hacking just as badly implemented databases are. In particular, wallets used for blockchain access can be vulnerable; after all, users have to have a means to access their blockchain accounts, and the means of that access can be stolen. For example, users each have a private key to unlock and use their blockchain accounts, and that key generation and management process is not foolproof. Wallet providers often store the private keys in a wallet file on the user’s hard drive in a commonly known or guessed directory that’s vulnerable to malware.  

A February 2018 survey of the security of blockchain systems by three researchers from universities in China pointed out that “If the private key is stolen by criminals, the user’s blockchain account will face the risk of being tampered by others.” A user in such a case can lose control of her key entirely.

Identity management weaknesses are another risk factor that has plagued early open blockchains. Elsewhere in the report, the researchers detailed the anonymous nature of bitcoin transactions as a security risk. A January 2018 paper from researchers at the University of Sydney, University of Technology Sydney, and the Stockholm School of Economics in Riga estimated that 25 percent of bitcoin blockchain users are predominantly involved in illegal activity.

At the same time, data is becoming easier and easier to generate, replicate and transfer. Data proliferation increases the likelihood of a data breach, simply because the data at some point becomes too voluminous to be manageable.   

The Trend toward Decentralization and Self-Sovereign Identity

Of course, blockchain technology offers benefits too. Well-implemented systems based on sound identity, data and security governance principles can address the kinds of issues noted above. Much depends on careful, holistic system design that considers the overall security environment and how to improve that environment with the help of newer technologies. Used sparingly in conjunction with off-chain approaches, purpose-built, decentralized blockchains can help make ownership and control of your own personal data more feasible and enable regulatory compliance at the same time.

Self-sovereign identity approaches provide a prime example of how users can begin to own and control their own data with the help of blockchain technology. The notion of “self-sovereign identity” designates the user as the sovereign, the user as owner and controller of her own personal data and nurturer of her own identity.

One key characteristic of self-sovereign identity is decentralization. As Phil Windley, chair of the Sovrin Foundation, points out, “so long as we insist on creating huge honeypots of valuable data, hackers will continue to target them. And since no security is perfect, they will eventually succeed.” Decentralizing the storage of personal data makes that data inherently less vulnerable to theft.

Another attractive characteristic is unlinked identifiers, or identifiers that can’t be tied to other online data about you. These unlinked identifiers can be used on public blockchains designed for the purpose. Some of those blockchains includes the Sovrin Network or Veres One.

These identifiers allow the safe transmission of credentials from an issuer (such as the Department of Motor Vehicles in the US) via the user, to a verifier (such as a bartender verifying that the user is of drinking age).

In our February 2018 Strategy + Business interview with Windley, I asked how the Sovrin approach treats biometric data. Windley stated that “Our architecture never stores personally identifiable information (PII) on the ledger itself. Of course, PII includes biometrics. So biometrics are just a subcategory of a large group of things that we won’t store on the ledger.” By using the Sovrin approach, biometric data can stay off chain and out of a central database.

Others are also moving in a comparable direction toward decentralization and more careful treatment of biometric and other personal data. In January 2018, Microsoft announced it was joining the ID2020 Alliance. The goal of the Alliance is to provide 1.2 billion people who don’t have a legal identity with the opportunity to obtain one. Microsoft’s decentralized approach is comparable to Sovrin’s, except that Microsoft will offer an encrypted identity hub that can store users’ personal data. Presumably this hub will not be blockchain-based.

Conclusion: Balancing Demand for Smart Access with Personal Data Controls and Compliance

“Information wants to be free,” Whole Earth Catalog founder Stewart Brand famously said at the first Hacker’s Conference in 1984. The information landscape has changed just a bit since then, and the definition of a hacker as well.

But the overall premise, albeit with many qualifications these days, continues to be valid. Better data management will have to be accomplished while simultaneously facing the prospect of more personal data proliferation. More and more systems will allow consumers smart access, to be identified through biometric means as they enter a store, an airport, or a border crossing checkpoint, or approach a car they just rented. Both consumers and providers will come to expect that level of convenience. To enable that convenience, biometric data will need to be stored in more places.

Millions of consumers already use biometric matching techniques, and it’s clear we’ll see broader adoption soon. Will the better data management principles of decentralized personal data, GDPR-compliant storage, and unlinked identifiers spread and gain adoption as well? Much depends on sufficient awareness raising and cooler heads prevailing. Both biometrics and blockchain technologies are associated with better security, but as should be clear from the description in this article of risks associated with those technologies, and how to mitigate those risks, it is thoughtful, informed systems design and a data-centric approach to architecture that really wins the day.

Alan Morrison is a Senior Research Fellow, Emerging Tech at PwC.


With the support of