AWS Public Sector Blog

UK Biobank enables medical research worldwide through vast database powered by AWS

AWS branded background design with tex overlay that says "UK Biobank enables medical research worldwide through vast database powered by AWS"

UK Biobank is the world’s most comprehensive source of health data used for research. It houses a vast, continuously growing dataset of biological, health, and lifestyle information. From 2006–2010, UK Biobank recruited 500,000 UK citizens between the ages of 40 and 69 to supply biological samples (blood, urine, and saliva) and information about their lifestyle regularly on an ongoing basis. Research participants also provided consent for linkage to their health-related records.

Today, UK Biobank has about 10,000 variables per volunteer, from simple lifestyle information to physical measures, electronic health records (EHRs), genetic sequencing, biomarker data, and full body scan images. But to reach its goal of a safe and accessible database, UK Biobank had to overcome the challenge of how best to accommodate all of the data in a way that met researchers’ needs. With the inclusion of whole-genome sequencing data for all 500,000 participants, the sheer size of the data (currently around 30 petabytes) meant they had to find a way for researchers to analyze the data where they were situated, instead of downloading huge amounts.

UK Biobank needed a purpose-built data platform with compute and data-storage capabilities that provided analysis tools in a centralized environment and the flexibility to manage increasing quantities of data, allowing researchers to work on the dataset with ease. This led to the establishment and launch in 2021 of the secure, cloud-based UK Biobank Research Analysis Platform (RAP), which is hosted on Amazon Web Services (AWS) in the Europe (London) Region and enabled by DNAnexus. This post highlights UK Biobank’s journey to becoming a globally accessible dataset for health researchers.

Health data for the public interest

The altruism of research participants is at the heart of UK Biobank’s existence. The dataset’s founders and core funders champion contributors’ generosity by making the data available to researchers worldwide, thereby maximizing its benefits as an enabler for new drug discovery, diagnostics, and treatments worldwide.

All the data is de-identified and available to approved researchers for health-related research that is in the public interest. Since the database opened in 2012, more than 30,000 researchers from 90 countries have registered to use UK Biobank. So far, there have been more than 10,000 scientific publications based on researchers’ discoveries using UK Biobank data.

These include discoveries about conditions including cancers, heart disease, chronic kidney disease, stroke, type 2 diabetes, and Alzheimer’s disease. For example, a PhD student in Boston, Massachusetts, took UK Biobank’s genotyping data (around 800,000 markers across the genome) to establish the value of polygenic risk scores (a measure of a person’s disease risk due to their genes). This kind of analysis could support earlier and more targeted interventions for heart disease or aggressive forms of cancer, for instance.

New findings continue to come thick and fast—there were more than 3,000 published reports in 2023. Each enhancement to the data adds to its potential for other scientists.

Continuous, collaborative innovation

UK Biobank is exploring the possibility of adopting new technologies, such as generative artificial intelligence (AI), to make its database even more accessible and digestible to researchers. Initially, generative AI algorithms may simplify and accelerate interrogating the database, for instance, through direct questions such as “How many people in UK Biobank have had a heart attack under the age of 65?” This may progress to predictive analysis, for example, “Given the cholesterol level of men over the age of 65 with obesity, what will their projected cholesterol level be in five years?”

UK Biobank hopes to see the development of complementary biobanks in more countries, as these are essential for capturing detail about disease progression in diverse demographics and environments. The dataset’s leadership team continue to provide advice to scientists on how to set up similar studies and look forward to seeing the continuing, transformative results the UK Biobank RAP has for diagnoses, treatments, and cures around the world.

Additional reading

Watch videos on how UK Biobank was set up and how its research environment can be harnessed for greater innovation.

The AWS Institute creates a library of guides, videos, and articles featuring insights and best practices shared by public sector leaders to help their peers accelerate their transformation programs. Find more AWS Institute thought leadership for public service leaders.

Rory Collins

Rory Collins

Sir Rory Collins is the leader of UK Biobank and a British Heart Foundation professor of medicine and epidemiology at the University of Oxford. He was appointed UK Biobank principal investigator and chief executive in 2005.

Naomi Allen

Naomi Allen

Naomi Allen is chief scientist for UK Biobank and a professor of epidemiology at the Nuffield Department for Population Health at the University of Oxford. She is primarily responsible for the linkage of electronic health records for all participants. Her research background is in cancer epidemiology.