Faculty of law blogs / UNIVERSITY OF OXFORD

Health Data Privacy: Hope, Hype and Harm Workshop

Posted:

Time to read:

10 Minutes

Authors: Dr Miranda Mourby, Zoya Yasmine, Professor Jane Kaye, Dr Michael Morrison

Recent changes to data protection law[1], alongside the rise of synthetic health data and the announcement of the UK Health Data Research Service[2] call for fresh consideration of the potential Hypes, Hopes and Harms[3] for privacy in health research.  

The Centre for Health, Law and Emerging Technologies (HeLEX), with the support of the Responsible AI team at GSK, hosted a multidisciplinary workshop on 13 May 2026, bringing together experts to explore the implications of these developments for UK health research and AI development. The workshop was organised around three themes: synthetic data, data governance models, and the legal bases for processing personal health data. Throughout the day, participants explored how unchecked Hype can mask potential Harm, while also identifying the genuine kernel of Hope in these developments that can, and should, be cultivated for societal benefit. 

Synthetic Data  

Speakers: Dr Colin Mitchell (PHG Foundation); Professor Mark Elliot (University of Manchester); Dr Puja Myles (MHRA/CPRD); Dr Tiago Sergio Cabral (University of Minho and European Commission AI Office) 

Synthetic data are artificially generated data designed to mimic the structure, properties and relationships of patient data without directly reproducing information about individuals.[4] The use of synthetic data in health research is increasing, such as to train AI models.[5] Puja Myles highlighted the Hope of synthetic data as one way to augment gaps in real-world data to create more representative datasets; a use case of particular interest to the Medicines and Healthcare products Regulatory Agency (MHRA). However, there can be Hype in the idea that synthetic data is needed to resolve ‘data scarcity,’ when high-quality real-world data remains essential for validating AI systems. A potential Harm lies in generating synthetic data from sparse or unrepresentative source datasets, which may amplify underlying biases. This suggests a need for data labelling and transparent documentation so researchers can understand the potential risks of using a specific synthetic dataset.  

Attendees at the Health Data Privacy workshop

Colin Mitchell suggested that synthetic data is entering the ‘trough of disillusionment’ stage of the Gartner Hype Cycle, in which early excitement gives way to more searching questions about practical value, limits and implementation.[6]  He cautioned that synthetic data may cause Harm if viewed as a large supply of non-private, non-confidential and non-personal data, bypassing the need for privacy considerations. However, there is also potential Harm in viewing all forms of synthetic data as inherently personal or private, even when generated in ways which safeguard against risks of individual identification. He suggested that Hope may lie in forms of post-release surveillance for synthetic data inspired by the regulatory approaches to post-market surveillance of medical products, which would emphasise the importance of continuous, proactive monitoring of synthetic datasets to mitigate lingering privacy and data protection risks. 

Mark Elliot challenged the binary between ‘real’ and ‘synthetic’ data, pointing out that all data are generated through technologies and data-generating processes, and thus involve some degree of artifice. Rather than asking whether data are real or fake, he suggested focusing on the processes by which data are generated, and the limitations or biases these processes may introduce. He also suggested dropping the term ‘real’ applied to data and suggested the adoption of the term now commonly used in the synthetic data research community ‘original’.  Hype can therefore lie in treating synthetic data as categorically different from ‘real’ data, while Hope could lie in developing better documentation of how different kinds of data are generated (which could link to the recommendations presented by both Puja and Colin above).  

Tiago Sergio Cabral reflected on how the EU’s AI Act interacts with the court’s relative definition of personal data within data protection law (see the court’s decision in EDPS v SRB[7] as well as its emphasis on reasonable foresight in the Scania judgment[8]), and the implications this may have for sharing data generated by AI. He identified Hope in the generation of synthetic data for bias mitigation (potentially falling within public interest exceptions) and the use of synthetic data to support data minimisation. Tiago also suggested that contractual limitations in synthetic data sharing agreements could mitigate the Harm from personal data breaches arising from unlawful re-identification of patients by lawful recipients.  

Governance Models  

Speakers: Dr Jessica Bell (University of Warwick); Robert Vandersluis (GSK); Professor Timo Minssen (University of Copenhagen/CeBIL) 

The second session considered the organisational models through which ‘real’ personal data is made available for health research. The processes through which data custodian bodies safeguard privacy in such data are becoming tightly regulated at the EU level, particularly with the passage of the Regulation on the European Health Data Space. Furthermore, following the announcement of a £600 million UK Health Data Research Service last year,[9] the question remains as to how UK law should shape, support and scrutinise the bodies that make patients’ data available for research. 

Jessica Bell discussed the use of a data trust structure in the pilot study for the Born in Scotland longitudinal birth cohort.[10] The model appears to offer some Hope, while also pushing lawyers to the edges of their comfort zones, given unresolved academic disagreement about whether data, or data-related rights, can be held as trust property. The pilot also suggested an empirical appetite for more participatory models of data governance, particularly in contexts where young people are rarely consulted about how their data should be governed. 

Robert Vandersluis situated UK health data governance against a wider backdrop of public controversy and media scrutiny, which illustrate perceptions of betrayal, exposure, and profiteering. These stories surfaced the different elements of Harm historically experienced in examples such as the Royal Free- DeepMind collaboration, as well as care.data. By contrast, he pointed to South Australia as an example of a health data governance environment with ‘success stories’ which feature productive public-private sector partnerships with a focus on innovation, economic development, and better services. Within the UK, the experiences of Wales and Scotland[11] as well as Northern Ireland[12] may serve as more Hope(ful) precedents for the forthcoming Health Data Research Service; suggesting that health data reuse need not inevitably provoke widespread controversy if its governance is trusted, proportionate and well-communicated. 

Timo Minssen offered a perspective from the EU, focusing in particular on the Regulation on the European Health Data Space and its creation of Health Data Access Bodies. There is Hope in the potential for these bodies to create more structured and consistent routes for health data access, but he also warned of Harm if they lack sufficient capacity, becoming either bottlenecks or rubber stamps. His remarks also highlighted deeper questions about solidarity and sovereignty in health data governance: openness without control risks becoming exposure, while data sovereignty must mean more than the physical location of a server. 

Data Protection Rules for Scientific Research  

Speakers: Dr Alison Knight (Health Research Authority); Cassie Smith (Health Data Research UK); Elisabetta Biasin (KU Leuven Centre for IT & IP Law) 

The final session focused on the statutory bases that permit processing of personal data in UK and EU data protection law. With the Data (Use and Access) Act 2025, the UK government has introduced a clarified definition of ‘scientific research,’ with the intention of making the path for the secondary use of personal data more certain for commercial research organisations.[13]  

Cassie Smith noted that while the Data (Use and Access) Act 2025,[14] has been presented by some as a significant step forward, for health research, it largely provides clarification rather than meaningful access. The main barriers faced by the research community continue to stem from fragmented governance, inconsistent decision-making and organisational risk aversion. She cautioned that Harm may arise if the ICO’s draft guidance on scientific research is interpreted too narrowly, since indicative criteria such as the publication of new knowledge may not map neatly onto essential research activities like linked dataset creation, curation or AI model development and risk excluding work that is foundational to scientific progress.  She argued that the real Hope lies in sustained investment in the Health Data Research Service, and in addressing the structural governance issues that legislation alone cannot resolve, creating a coherent and coordinated system with clear routes for access, proportionate oversight and consistent application of safeguards.[15] 

Attendees at the Health Data Privacy workshop

Elisabetta Biasin provided an EU counterpoint, as the European Data Protection Board is also consulting on its long-awaited guidance on the processing of personal data for scientific research purposes. She suggests that the guidance offers Hope in giving greater structure to the concept of ‘scientific research’ within data protection. However, it also risks Harm through three important omissions: the data protection issues raised by AI, the interplay with other EU digital laws, including the AI Act and cybersecurity legislation, and considerations of accuracy in data protection obligations such as legitimate interests assessments. She pointed to work from other bodies, including the European Medicines Agency,[16] as providing examples of how the guidance could have built on existing reflection in this area.  

Finally, Alison Knight described how regulators and governance bodies might respond to Hope, Hype and Harm through the values of trust, scrutiny and accountability. Building on the discussion of ‘scientific research’, she noted that broader data protection concepts may offer Hope for research reuse and AI development, but can also generate Hype if they are assumed to solve the wider problem of trustworthy access to health data. Conversely, Harm may arise where misalignment between the definitions of scientific research across policy, legal, ethics, and regulatory bodies creates unclear or inconsistent governance. She emphasised that the question is not simply whether an activity can be brought within a definition of scientific research, but whether the surrounding governance remains proportionate, visible, and capable of demonstrating legitimacy.  Her central proposition was that health data sharing for research and AI development should keep system values at the centre, make justifications of data use visible and intelligible, and streamlining governance without losing trust.  

Policy Recommendations 

In the final plenary session, participants discussed policy recommendations stemming from the day’s discussions. The most ambitious proposal was statutory reform to support a combined data protection and ethical review of health data research, for example, by giving Research Ethics Committees a mandate to consider both the lawfulness of data processing and the ethics of the research. This policy recommendation was suggested based on the success of the Norwegian model. 

Investment in public sector infrastructure was also considered a policy priority, which is particularly salient for the development of the Health Data Research Service. In order to operationalise this, lessons from other countries with successful centralised health data sharing models could be instructive—for instance, Estonia and Finland. Participants also suggested more coordinated patient and public engagement research, so that findings about public trust and preferences in health data research are not drawn only from small studies, but, from a broader national evidence base to inform policy.  

Other recommendations picked up the theme of transparency: for example, using accessible language to explain research and data protection to patients and participants; publicising ‘small wins’ to counterbalance the scandals that tend to dominate media headlines; case studies from the ICO for synthetic data generation and anonymisation of health data; and opening up data protection governance through the safe publication of DPIAs and, where appropriate, edited risk registers.  

Taken together, these recommendations reflected a shared concern that Hope in health data research should be made visible and sustainable, that Hype should be tested against evidence and experience (including from other jurisdictions), and that Harm should be anticipated through governance that is lawful, technically robust, intelligible and worthy of public trust. 

Acknowledgements:  

We are very grateful to the workshop speakers who presented their research. Thank you also to the workshop facilitators who also supported discussions on the day: Brandy Coote, Dr Katarina Foss-Solbekk, and Dr Michael Morrison. And, of course, we are grateful to all the workshop participants who attended and contributed to the discussions.  

This workshop was made possible by the generous support of GSK and their Responsible AI team (Robert Vandersluis, Dr Markus Trengove and Ella Shoup). 

Footnotes:

[1] Particularly under ss.67-86 Data (Use and Access) Act 2025, as well as the planned changes to the Health Service (Control of Patient Information) Regulations 2002, which were announced under the 2025  Life Sciences Sector Plan: https://www.gov.uk/government/publications/life-sciences-sector-plan/life-sciences-sector-plan 

[2] Department of Health and Social Care, Prime Minister’s Office, 10 Downing Street, Department for Science, Innovation and Technology, Office for Life Sciences and Office for Investment, ‘Prime Minister turbocharges medical research’ (GOV.UK, 7 April 2025) https://www.gov.uk/government/news/prime-minister-turbocharges-medical-research. 

[3] The workshop built on the work of authors such as Robert Wachter and Timothy Caulfield, who have similarly reviewed the US landscapes of digital health  and biotechnology  through the prism of Hope, Hype and Harm. See Robert Wachter, The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age (McGraw-Hill Education 2017); Timothy Caulfield, ‘Spinning the Genome: Why Science Hype Matters’ (2018) 61 Perspectives in Biology and Medicine 560; Timothy Caulfield, ‘Popular Media, Biotechnology, and the “Cycle of Hype”’ (2005) 5 Houston Journal of Health Law & Policy 213. 

[4] US Food and Drug Administration, ‘Digital Health and Artificial Intelligence Glossary: Educational Resource’ (26 September 2024), https://www.fda.gov/science-research/artificial-intelligence-and-medical-products/fda-digital-health-and-artificial-intelligence-glossary-educational-resource#s 

[5] Lewis Hotchkiss, ‘Synthetic Data: A Game-Changer for Dementia Research’ (Dementias Platform UK, 9 July 2025) https://www.dementiasplatform.uk/news-and-media/blog/synthetic-data-a-game-changer-for-dementia-research 

[6] Gartner, ‘Gartner Hype Cycle Research Methodology’ https://www.gartner.com/en/research/methodologies/gartner-hype-cycle 

[7] Case C-413/23 P European Data Protection Supervisor (EDPS) v Single Resolution Board (SRB) [2026] 1 CMLR 24. 

[8] Case C-251/22 P Scania AB and Others v European Commission [2024] 4 CMLR 24. 

[9] Department of Health and Social Care, Prime Minister’s Office, 10 Downing Street, Department for Science, Innovation and Technology, Office for Life Sciences and Office for Investment, ‘Prime Minister turbocharges medical research’ (GOV.UK, 7 April 2025) https://www.gov.uk/government/news/prime-minister-turbocharges-medical-research 

[10] Data Trusts Initiative, ‘Born in Scotland Data Trust’ https://datatrusts.uk/pilot-bis 

[11] Margaret McCartney, ‘Care.data: Why Are Scotland and Wales Doing It Differently?’ (2014) 348 BMJ g1702.  

[12] Brendan O’Brien, ‘Care.data: How Northern Ireland Is Doing It’ (2014) 348 BMJ g2380.  

[13] Information Commissioner’s Office, ‘Response to DCMS Consultation “Data: A New Direction”’ (6 October 2021) 25–27 https://ico.org.uk/media2/migrated/4018588/dcms-consultation-response-20211006.pdf 

[14] Particularly s.67 DUAA 2025, which took the definition of ‘scientific research’ from a recital of the General Data Protection Regulation (‘GDPR’) and gave it more prominent status in a revised Article 4 of the UK GDPR.   

[15] Cassie Smith, Andy Boyd, Christina Pagel and Andrew Morris, ‘Trust, not technology: governing access to health data as the decisive challenge for the UK’ (2026) 0 Lancet Digital Health https://www.thelancet.com/journals/landig/article/PIIS2589-7500(26)00017-8/fulltext 

[16] European Medicines Agency, ‘Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle’ (9 September 2024): https://www.ema.europa.eu/en/documents/scientific-guideline/reflection-paper-use-artificial-intelligence-ai-medicinal-product-lifecycle_en.pdf