Genomics England Fuels Critical Genome Research with the WEKA® Data Platform
Overview
WEKA Delivers the Extreme Performance Needed to Support Genomics England’s Race to Sequence 5 Million Genomes.
Genomics England (GEL) works in partnership with the United Kingdom’s National Health Service (NHS) to administer its ambitious 100,000 Genomes Project, which aims to create the world’s largest genomic database by gathering genomes from NHS patients with cancer or rare diseases and their families for analysis. Originally launched in 2014, the project’s mission was expanded in 2018 with a target of processing an unprecedented 5 million genomes in five years to accelerate medical research and discovery. This landmark public health project is supported by a team of more than 3,000 scientific researchers and is expected to be processing a staggering 140 petabytes of DNA data by 2023.
The Challenge
Genomics England initially implemented a traditional scale-out NAS solution to support the original 100,000 Genome Project. However, the system was already reaching its scale and performance limits under the strain of the early project parameters.
With its sequencing targets expected to expand from one to 5 million genomes in just five years, Genomics England needed to upgrade its data infrastructure to scale to support the project’s projected growth and provide the necessary performance to achieve the new target in an unprecedented amount of time.
Additionally, Genomics England’s original data management solution had no viable disaster recovery strategy, leaving sensitive patient health data vulnerable if a security breach or disaster were to occur. Further, backing up the project’s more than 20 petabytes (PBs) of data under management was not financially feasible. Genomics England needed a flexible, secure, and highly scalable data management solution to realize its ambitious goals by 2023.
The new solution needed to:
Scale capacity in a single namespace to support up to 140PBs of data
Enhanced data pipeline performance to keep pace with research objectives
Provide native disaster recovery and data encryption
Improve cost efficiencies to support tight budget constraints
The Solution
The WEKA Data Platform With Quantum ActiveScale Object Storage
WEKA provides Genomics England with the multi-tiered architecture needed to combine commodity flash and disk-based technologies in a single unified platform solution. The primary tier consists of 1.3 PBs of high-performing NVMe-based flash storage to support its researchers’ working datasets. The secondary tier consists of 40 PBs of object storage to provide a long-term data lake repository.
The WEKA Data Platform presents Genomic England’s entire 41 PB dataset in a single namespace and provides automated data tiering to optimize workload performance and provide better cost efficiency.
“We needed something that’s much more scalable than existing NAS solutions — an infrastructure that could grow to hundreds of petabytes. Our existing solution couldn’t provide that scale and wasn’t performing as well in these magnitudes — that’s what drove us to WEKA.”
Outcomes
Infinite scale to support exponential data growth
Massive performance improvements to accelerate innovation
Faster data processing and time to insight
Significantly improved cost efficiency per genome
Enhanced security with disaster recovery in place
Cloud integration for computing elasticity and data mobility
Dive a little deeper