Modern Workloads Need a Modern Infrastructure
In the last blog post we discussed the characteristics of modern workloads. The challenge of managing the sheer amount of data being generated, coupled with the need to more quickly glean insights from it, has created an infrastructure nightmare. Legacy compute and storage solutions were designed to solve yesterday’s problems and can’t deal with the needs and complexities that organizations are facing now or will be in the near future.
Organizations have been reporting unstructured data growth of over 50% year over year. At the same time, 79% of enterprise executives agree that not extracting value and insight from this data will lead to extinction for their businesses. The problem, therefore, is how do you manage and store all of this data to maintain your competitive advantage in the marketspace–and what barriers do you avoid as you traverse the path to data management success?
The Impact of Data Volumes on Storage and Compute
First of all, let’s look at the cause of the problem. The boom of big data has created amazing new opportunities for innovation, but also unforeseen problems for data-intensive applications, particularly in AI/ML, financial analytics, genomics/life sciences, and technical computing. There are challenges in scaling datasets to extreme capacities while maintaining the maximum performance demanded by these compute and data-intensive applications. Legacy compute and storage solutions were designed to solve yesterday’s problems, and they did it well, yet the harsh reality is that they can’t deal with the needs and complexities that organizations are facing now or will be soon.
Specifically, companies need to eliminate the complexity and compromises associated with legacy storage (DAS, NAS, SAN) and find the necessary performance for their workloads at ANY scale. Here are some of the common challenges that they face:
- Creating large Petabyte to Exabyte data stores
- Leveraging more data and faster for a competitive advantage
- Efficiently utilizing resources at scale
- Feeding today’s accelerated compute and maximizing compute utilization
- Leveraging the cloud with existing workloads
- Multiple sites working on the same data
- Unifying multiple data sources and formats under a single infrastructure
- Managing volumes of unstructured, multi-format data
- Containerized applications (challenge is stateful applications)
Today data is the new oil, and valuable insight can be unleashed in a timely manner, but technology can be a barrier. Modernizing your infrastructure (modern data architecture) is key to unlocking data’s full potential. All of these challenges are daunting when taken individually, but many organizations face a number of them at the same time. This condition is particularly acute in data-intensive areas that need distributed accelerated computing, including AI machine learning and deep learning.
Examples of Modern Workloads
We know that a workload, in general, is any application or program that runs on a computer to produce some kind of result. Workloads can be simple, like the note-taking app on a smartphone, to amazingly complex, like enterprise workflow automation that connects thousands of clients through hundreds of servers across a huge network.
In terms of modern enterprise data storage networks, workloads are huge and distributed, come in different formats, and can reside anywhere. That’s because data is everywhere in multiple use cases, and those who are managing the workloads in the most demanding use cases are the ones who are feeling the most pain:
- AI/ML–The term “living on the edge” has taken on new meaning. Essentially a bridge between networks, traditional edge devices were things like routers that connected two networks, such as connecting a college campus to the internet. Modern edge devices have evolved to include things like sensors that actually generate data, such as manufacturing machinery that interacts with different sensor networks at the “micro edge.” Another example here is smart cars and autonomous driving, where the compute is in the car, collecting data and making some immediate decisions for action, yet the training is done elsewhere.
- Life Science–Genomic sequencers process tons of genomes per day–all at different stages of sequencing, using different files and different formats. Performance and reliability are key to generating faster time to insight, such as creating drugs to prevent illness and cure disease.
- Financial Services–Practitioners in high-frequency trading, fraud detection/prevention, and risk management all need blazing fast performance from accelerated high-performance compute clusters to accelerate ever-faster results, such as make quick trading transactions, catch the bad guys who want your money, and mitigate business risks at every turn.
What are the Characteristics of Modern Workloads
Modern data architectures use modern technology and tools to enable you to get the most of your data by using your infrastructure to your best advantage. Let’s look at some of the things that modern architecture does.
- Supports cloud tiering
- Enables data mobility
- Scalability
- Supports accelerated compute, including networking and storage
- Removes data silos and supports mixed workloads without performance bottlenecks
- Supports high-performance containerized workloads
At its most fundamental level, tiering to the cloud reduces storage total cost of ownership (TCO) by providing you with access to on-demand storage and compute. No need to provision and plan ahead of time. Managing burst of demand is easy, and the months of connecting new hardware in your data center are reduced to seconds. Modern storage solutions make it easy to manage the cloud. They virtualize your on-prem and cloud environments into a single easy to manage nameserver. .
The introduction of the cloud and multi-sites leads to a new challenge: moving the data between the sites in a seamless way, making it available when needed, to the right resources, and without any additional delay. Sophisticated data mobility capabilities move the data seamlessly between fast tier and object tier, giving enterprises much more flexibility with their infrastructure without compromising performance.
Data centers need to scale storage to accommodate growth: growth in data volume, growth in users for the data, and growth in data processing requirements. Smart companies start small and grow their infrastructures as their businesses need to grow. They can (1) scale for disaster recovery to store a copy of data in the same format in the cloud; (2) scale for capacity to respond to unpredictable business demands without changing process or infrastructure; (3) burst for compute to leverage cloud resources when they’re short on cores in the data center; (4) scale across Availability Zones (AZs) to analyze Big Data in the cloud and span AZs with no performance degradation; (5) and scale across workloads in different storage tiers, whether hot or cold, and still receive consistent, predictable performance.
The modern buyer journey involves investing in server platforms that can leverage compute acceleration technologies, like GPUs. AI needs a powerful compute infrastructure to explore, extract, and examine the data to gain deep insights and deliver breakthrough results, and GPUs are at the heart of modern supercomputing. As the quintessential workhorses and multitaskers, GPUs easily manage the most complex data sets in AI workloads. There are some powerful platforms from which you can choose to provide the balance of accelerated compute, memory, and high speed NVLINK interconnects to process those workloads with unprecedented performance. This results in not only faster results but high quality output since more iterations in testing models means more model accuracy. High performance compute will be properly utilized only if the “pipes” that feed it i.e. bandwidth and IOPS can offer matching performance. Traditional NAS, for example offers up to 1.1 GB/sec while networks today offer up to 400 GB/sec and modern storage can support up to 162GB/sec of throughput as well as 970K IOPS.
Data silos waste resources, so building your infrastructure to eliminate them is key. The problem for some companies is that while network and compute can be virtualized to operate at scale very effectively, their storage remains largely isolated in silos based on system performance profiles. Consequently, they are forced to architect multiple storage solutions, each optimized for one data format or another. The result is a storage solution that is complex, temperamental, expensive, and slow. Modern infrastructure enables a single, flexible storage architecture is the solution, one that is a software-only, high-performance file-based storage solution that is highly scalable and easy to deploy, configure, manage, and expand.
You need a system that can provision and manage shared file storage for containerized workloads and that supports interoperability for containerized applications like Kubernetes. The beauty of a software solution is that it can provide an interface between the logical volumes in a Kubernetes environment (known as Persistent Volumes) and the storage so that you can deploy stateless clients, provision a Kubernetes pod volume, and simplify the process of moving containerized workloads to the cloud or sharing across multiple clusters.
WekaFS™ – Modern Storage for Modern Workloads
Today’s workloads demand a new class of storage that delivers the performance, manageability, and scalability required to obtain or sustain an organization’s competitive advantage. The WekaFS Data Platform was designed and optimized for data-intensive modern workloads. It is ideally engineered to take storage performance and data availability to the next level as performance demands intensify in Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL). Its architecture and performance are designed to maximize your usage of GPUs across cloud, on-premises, or hybrid deployments, providing data management capabilities that can accelerate time to insight/time to EPOCH by as much as 80x.
Contact Weka to find out more about what the simplicity, speed, and scale of WekaFS can do to accelerate your modern workloads.
A Buyer’s Guide to Modern Storage
Three new rules for selecting scalable storage solution
Additional Helpful Resources
Modern Data Architecture – Essentials & Best Practices
Six Mistakes to Avoid While Managing a Storage Solution
IDC Report – Requirements of a Modern Storage Platform
Data Management in the Age of AI
Genomics and Cryo-EM Workflows in Life Sciences