Snap-To-Object (S2O): A Multifaceted Data Management Tool for DataOps Teams

Anand Nadathur. March 13, 2023

Snapshot technology is a foundational capability that many operations involving data are built upon. Most products and solutions in the data protection ecosystem leverage snapshots to secure the original copy of data from all kinds of compromise, ranging from accidental deletion (OMG!-Why did you use rm* ?!?) to disaster recovery to ransomware protection. Data mobility and collaboration tools gain operational efficiency by depending on the incremental nature of snapshots to transfer and update data between sites. Beyond these, snapshots are also used at scale for improving performance of multi-reader read-intensive applications (think multiple users streaming the same movie simultaneously from an online content platform) or intensive financial transaction analysis applications.

In all of these examples, budget-conscious DataOps teams are challenged when storing snapshot data at scale in a cost-effective medium that allows their applications to get access to versions of their dataset instantaneously while storing the majority of the remaining versions of data in a cold, inexpensive medium.

Legacy storage vendors are able to meet performance needs of traditional applications through their All-Flash offerings but penalize customers from a total cost of ownership (TCO) perspective. Replication products are cost prohibitive for next generation applications requiring data sharing and availability in distributed sites at scale. Hybrid solutions that combine spinning disks with NVMe flash storage are able to reduce system costs for a limited segment of a data pipeline workflow but incur high TCO for operations involving large data sets that DataOps teams are asked to manage efficiently for all of their data-intensive applications.

The WEKA Data Platform

WEKA has built a software-only high-performance clustered data platform that is highly scalable, easy to deploy, configure, manage, and expand. The key tenets of this hybrid architecture that accelerates a variety of next generation workloads such as AI/ML modeling, genomic sequencing, VFX editing, DevOps, EDA design, HPC, financial risk analysis and more are Speed, Simplicity, and Scale with an attractive TCO. At the core of this solution is a fully distributed parallel file system that combines NVMe Flash with object storage in a single namespace providing high-performance, low latency for multiple workloads through different protocols such as POSIX, GPU Direct Storage, NFS, SMB, and S3. This single hybrid system eliminates the need for copying or moving data at scale to meet data resiliency, availability, and performance specifications that next-gen applications require at different stages of their data operations life cycle.

WEKA Snap-To-Object (S2O) is a capability that layers on top of the WEKA file system (WekaFS) and simplifies data management at scale for customers facing data mobility and data protection challenges for large data sets. S2O is one of WEKA’s crown jewels and is widely used by its customers. DataOps teams use S2O to move WekaFS snapshot data and metadata to an object store. The object store can be an on-premises content platform accessible through standard S3 protocol or an object store in any of the cloud hyperscalers such as AWS, Azure, GCP or OCI. The data on the object store can be instantly restored anytime using S2O for operational recovery, disaster recovery, cloud bursting, archiving, collaboration, edge-to-core computing and other use cases. S2O enables data mobility and availability at scale between one or many sites, across NVMe and object storage using snapshot technology. When snapshots are moved using S2O, the data transferred is incremental and minimal in size, keeping capacity requirements low while providing high-speed updates of snapshots to the object store.

Operational and Disaster Recovery

The DataOps team in every enterprise is responsible for instant data recovery, be it for single file recovery requests or site-level disaster recovery events. The solution in-place has to meet Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) for all types of workloads detailed above especially when operating at scale. The request to restore a few FASTQ or BAM files (a few 100GBs) may come at the same time as the system is setting up a rerun for a machine learning model that predicts your chances of developing certain health conditions. Utilizing S2O WEKA can rehydrate a filesystem into a new cluster or restore a filesystem in an existing cluster in parallel to recover from a range of different failure scenarios. Snapshots can even be synchronized to another cluster in advance to enable shorter RPO and RTO. One differentiating aspect of the WEKA solution is that the primary and the recovery WEKA clusters do not have to be the same size. This can save costs by having a smaller sized cluster where the filesystem is being recovered.

Cloud Bursting

Architecting an infrastructure that can handle the bursty nature of applications such as video streaming or on-demand requirements for high-performance computing environments from application teams is a recurring challenge that DataOps teams in all enterprises should be prepared for. Enterprises are able to leverage the elastic nature of the cloud to augment unpredictable workload requirements. DataOps teams can bring up a WEKA cluster that does not need to be the same size as the original cluster to save on costs and availability of resources in the cloud. Then using S2O, all of the cluster’s data can be rehydrated in a matter of minutes. WEKA auto-scaling provides on-demand elasticity to match the storage performance needed to accelerate applications like before. DataOps teams can burst to any cloud of their choice at any time they want using S2O and a copy of data stored in the object store. Budget-constrained teams who spin-down cloud instances to save on costs can spin-up clusters when they want to resume operations seamlessly where they left off.

Collaboration

S2O allows the DataOps team to synchronize data between two sites enabling seamless and efficient collaboration. The secondary site can use S2O to get near-synchronous copies of data from the primary site. DataOps teams can use S2O to update changes as frequently as a few minutes apart to share data changes made in the primary site. Teams can get access to new files without closing existing open files on the secondary site. This allows DataOps teams such as those in VFX environments to collaborate efficiently on large video files across continents and continue using tools such as Autodesk Flame without having to restart them to get new files created on one end of the continent.

Edge to Core Computing

DataOps teams operating at multiple sites need the ability to leverage consolidated compute resources available at a centralized location while powering application demands at the edge. S2O can enable data mobility and cost effective sharing of data at scale between sites using the incremental nature of snapshots. S2O can easily be set up to share data both ways, edge to core and core to edge, or to consolidate content from multiple edge sites to a single core site. Edge and core sites can be in the cloud or on-premises and can scale from small 6-node clusters to hundreds of nodes in a cluster allowing DataOps teams to match resources to meet site-level application demands.

S2O Accelerates DataOps

DataOps teams enable the creation of business value from raw data. They strive to accelerate application outcomes while providing data mobility and availability. Their success is measured by performing operations efficiently, sustainably, and without compromising performance. WEKA S2O is a multifaceted tool that every DataOps team can benefit from to accelerate data availability to demanding applications whenever and wherever it is needed.

TL;DR – Five Reasons to Use Snap-To-Object

Snap-To-Object uses snapshot technology to commit a copy of data and its metadata to an object store enabling efficient and sustainable storage operations.
Data committed to an object store by Snap-To-Object can be rehydrated in a matter of minutes for Cloud Bursting, Disaster Recovery and Collaboration use cases.
Snap-To-Object can synchronize data between sites and move incremental data from one site to another enabling efficient data sharing without application downtime.
Snap-To-Object enables efficient collaboration of large amounts of data between multiple Edge and Core sites.
Data archived to an object store by Snap-To-Object can be retrieved on-demand using current and future versions of WEKA software.

Learn More About the WEKA Data Platform

WEKA DATA PLATFORM

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US

Snap-To-Object (S2O): A Multifaceted Data Management Tool for DataOps Teams

The WEKA Data Platform

Operational and Disaster Recovery

Cloud Bursting

Collaboration

Archive

Edge to Core Computing

S2O Accelerates DataOps

TL;DR – Five Reasons to Use Snap-To-Object

Snap-To-Object (S2O): A Multifaceted Data Management Tool for DataOps Teams

The WEKA Data Platform

Operational and Disaster Recovery

Cloud Bursting

Collaboration

Archive

Edge to Core Computing

S2O Accelerates DataOps

TL;DR – Five Reasons to Use Snap-To-Object

Share On Social:

Related Assets

Checkmate on Checkpoints in LLM Development

Five Key Questions to Supercharge a Winning AI Strategy

2024 AI Trends: Scaling Innovation, Generative AI, and Infrastructure Challenges