Genomics England Fuels Critical Genome Research with the WEKA® Data Platform

Overview

WEKA Delivers the Extreme Performance Needed to Support Genomics England’s Race to Sequence 5 Million Genomes.

Genomics England (GEL) works in partnership with the United Kingdom’s National Health Service (NHS) to administer its ambitious 100,000 Genomes Project, which aims to create the world’s largest genomic database by gathering genomes from NHS patients with cancer or rare diseases and their families for analysis. Originally launched in 2014, the project’s mission was expanded in 2018 with a target of processing an unprecedented 5 million genomes in five years to accelerate medical research and discovery. This landmark public health project is supported by a team of more than 3,000 scientific researchers and is expected to be processing a staggering 140 petabytes of DNA data by 2023.

The Challenge

Genomics England initially implemented a traditional scale-out NAS solution to support the original 100,000 Genome Project. However, the system was already reaching its scale and performance limits under the strain of the early project parameters.

With its sequencing targets expected to expand from one to 5 million genomes in just five years, Genomics England needed to upgrade its data infrastructure to scale to support the project’s projected growth and provide the necessary performance to achieve the new target in an unprecedented amount of time.

Additionally, Genomics England’s original data management solution had no viable disaster recovery strategy, leaving sensitive patient health data vulnerable if a security breach or disaster were to occur. Further, backing up the project’s more than 20 petabytes (PBs) of data under management was not financially feasible. Genomics England needed a flexible, secure, and highly scalable data management solution to realize its ambitious goals by 2023.

The new solution needed to:

Scale capacity in a single namespace to support up to 140PBs of data

Enhanced data pipeline performance to keep pace with research objectives

Provide native disaster recovery and data encryption

Improve cost efficiencies to support tight budget constraints

The Solution

The WEKA Data Platform With Quantum ActiveScale Object Storage

WEKA provides Genomics England with the multi-tiered architecture needed to combine commodity flash and disk-based technologies in a single unified platform solution. The primary tier consists of 1.3 PBs of high-performing NVMe-based flash storage to support its researchers’ working datasets. The secondary tier consists of 40 PBs of object storage to provide a long-term data lake repository.

The WEKA Data Platform presents Genomic England’s entire 41 PB dataset in a single namespace and provides automated data tiering to optimize workload performance and provide better cost efficiency.

“We needed something that’s much more scalable than existing NAS solutions — an infrastructure that could grow to hundreds of petabytes. Our existing solution couldn’t provide that scale and wasn’t performing as well in these magnitudes — that’s what drove us to WEKA.”

David Ardley, Director of Infrastructure Transformation

Outcomes

Infinite scale to support exponential data growth

Massive performance improvements to accelerate innovation

Faster data processing and time to insight

Significantly improved cost efficiency per genome

Enhanced security with disaster recovery in place

Cloud integration for computing elasticity and data mobility

title title

Start Solving the Big Problems