Driving the Future of AI and HPC: WEKA at SC24

Colin Gallagher. November 19, 2024

This year’s Supercomputing Conference (SC24) has begun, bringing together leading experts in modern high-performance computing (HPC), AI, and data science that are driving data-driven innovation forward. Each year, SC showcases breakthrough research, cutting-edge technology, and emerging trends that define the future of computational power, and this year promises to be no exception. From advancements in exascale computing and AI workloads to breakthroughs in quantum technology, SC has become the proving ground for ideas that will reshape industries and society at large.

As the industry convenes to showcase the latest advancements in supercomputing, this year’s conference provides an ideal platform to introduce new solutions from WEKA and our partners, specifically crafted to address the evolving challenges in high-performance computing (HPC) and AI. With the growing complexity of AI models, data volumes, and real-time processing needs, organizations require scalable, resilient, and high-performance solutions that go beyond traditional storage and data management capabilities. Our latest innovations are designed to meet these demands, offering unprecedented speed, flexibility, and reliability for data-intensive workloads across diverse environments—from data centers to cloud and edge deployments. These solutions empower organizations to scale dynamically, maximize resource efficiency, and unlock insights at a transformative pace, effectively bridging the gap between cutting-edge research and real-world AI applications. At this year’s conference, WEKA along with our partners NVIDIA, Supermicro, Arm, Run:ai, and others are announcing ways in which we can enable robust scaling to handle fluctuating loads, deliver effective orchestration across multiple components, enhance security and uptime as well as ways to deliver more AI compute power with lower power and cooling footprint.

Accelerating AI with Unmatched Efficiency: The First Storage Solution for NVIDIA Grace CPU Superchips

As AI and high-performance computing (HPC) workloads evolve, they require incredibly fast data access and efficient processing power. WEKA, NVIDIA, Supermicro, and Arm are stepping up to meet this challenge, combining WEKA’s ultra-fast data platform with NVIDIA’s Grace CPU Superchip to set new standards for performance, scalability, and energy efficiency in data-intensive environments. This powerful duo enables faster AI training, reduced latency, and smarter resource use—all while keeping power consumption low.

Today, we announced the industry’s first high-performance storage solution built for the NVIDIA Grace™ CPU Superchip. Running on a robust new Supermicro storage server powered by WEKA® Data Platform software and Arm® Neoverse™ V2 cores, this solution leverages the NVIDIA Grace CPU Superchip to deliver unparalleled performance density and energy efficiency for accelerating enterprise AI workloads. By reducing I/O bottlenecks and enhancing data access, this joint solution lets data centers reach unprecedented performance with significantly lower energy consumption. The WEKA Data Platform is set to be available on Grace servers in early 2025, setting the stage for a future-ready infrastructure that can grow with you.

The NVIDIA Grace CPU, with its 144 high-performance Arm Neoverse V2 cores, doubles the energy efficiency of traditional x86 servers. Combined with WEKA’s AI-native data architecture, this setup ensures top performance across AI pipelines, maximizing GPU utilization and speeding up insights while using significantly less power. This combination empowers organizations to handle demanding AI workloads more effectively, boosting both speed and efficiency.

Grace CPUs feature high-bandwidth LPDDR5X memory, offering 1 TB/s of memory bandwidth, which, together with WEKA’s architecture, eliminates bottlenecks and enables seamless data flow. This results in faster AI training, shorter epoch times, and higher inference speeds—allowing enterprises to scale AI workloads without sacrificing performance. Such resource optimization helps meet the demands of data-intensive environments smoothly and efficiently.

Beyond performance, this storage solution sets new standards in energy and space efficiency. Built for large-scale AI and modern HPC workloads, the WEKA Data’s Platform enables organizations to reduce data center footprint and energy consumption. The energy-efficient Grace CPU together with WEKA’s infrastructure consolidation capabilities empowers organizations to accomplish morewith fewer resources, driving AI performance while supporting sustainability goals.

The WEKA Data Platform enhances GPU stack efficiency by 10 to 50 times, streamlining large-scale AI and HPC workloads. By reducing data duplication and enabling flexible cloud scalability, it shrinks data infrastructure needs by 4 to 7 times and significantly cuts carbon emissions, saving up to 260 tons of CO2e per petabyte stored each year and lowering energy costs by as much as 10 times. Coupled with the Grace CPU Superchip’s 2x energy efficiency, this solution empowers customers to accomplish more with fewer resources, supporting sustainability goals while maximizing AI performance.

This industry-first joint architecture enables businesses to reduce costs, accelerate performance, and get to market faster with AI, HPC, and data analytics. For companies running complex AI models, large-scale simulations, or real-time data processing, this powerful combination provides the speed, efficiency, and energy savings that are essential in a data-driven world.

Introducing WARRP: The Cloud-Agnostic AI RAG Reference Platform for Scalable, Sustainable Production Environments

As enterprises increasingly adopt AI-driven applications, deploying robust Retrieval-Augmented Generation (RAG) inference environments that can seamlessly handle large-scale, data-intensive workloads is becoming essential. However, moving from proof-of-concept to production brings complex challenges. These production AI environments require reliable scalability, efficient resource orchestration, and the ability to span diverse infrastructure, from on-premises data centers to multiple cloud providers. Additionally, ensuring high performance while managing costs, meeting strict security requirements, and minimizing carbon impact are ongoing hurdles for organizations trying to bring RAG solutions into production.

Today, we’re excited to introduce WARRP, the WEKA AI RAG Reference Platform—a cloud-agnostic solution designed to meet these production AI challenges head-on. Built to deliver consistent performance, simplified management, and scalable deployment across data centers and clouds, WARRP leverages WEKA’s advanced data platform to enable capabilities rarely possible on shared file systems. With high-performance vector database support, streamlined data transfer across locations, and flexible scalability, WARRP empowers organizations to deploy AI inference environments that are not only efficient and sustainable but also ready to evolve with new frameworks and tools as they emerge.

We created WARRP, the WEKA AI RAG Reference Platform, as a cloud-agnostic RAG inference platform that delivers consistent frameworks, manageability, and outcomes—whether deployed in a data center or in the cloud. WARRP takes advantage of WEKA’s unique capabilities, enabling tasks that are typically challenging on a shared file system. For instance, it supports high-performance vector databases, batch data ingestion using our high-speed POSIX system while indexing over S3, and seamlessly transferring data between locations within the pipeline, such as ingesting data in one location and transforming it in another.

WARRP defines the essential layers needed for a robust production-grade RAG inference solution. It starts with the infrastructure layer, spanning multiple data centers or cloud providers. Next is the WEKA data layer, providing identical performance and capabilities across both on-premises and cloud environments. The orchestration layer follows, incorporating a Kubernetes container scheduler and a GPU orchestration solution like Run:ai. To simplify deployment and management, we’ve chosen NVIDIA’s frameworks, such as NIMs and Nemo, which form the NVIDIA Enterprise stack. Above this layer is the development layer, using tools like Jupyter for coding. Next, middleware tools like Langsmith and Milvus (a distributed vector database) support deployment, with the models layered above them, either packaged by NVIDIA NIMs or containerized per enterprise requirements. Finally, applications using all these layers deliver user interfaces and extract meaningful value.

WARRP’s core strength is its ability to dynamically scale with inference demands and toggle between fine-tuning and inference as needed. It also enables running a distributed vector database on WEKA, offering exceptional performance and scalability for the entire RAG pipeline, with the added flexibility to back up and send data to remote environments for redundancy or bursting.

With WARRP, we’ve established the fundamental layers and validated specific frameworks for a production RAG inference pipeline. Going forward, we will continuously add new frameworks to each layer, such as managed Kubernetes services like EKS, AKS, or GKE, and integrate additional community-released frameworks demonstrating value. This iterative approach keeps WARRP aligned with the latest advancements in the AI ecosystem.

As SC24 highlights the forefront of innovation in high-performance computing and AI, WEKA is proud to contribute solutions designed to tackle the real challenges of modern data-intensive environments. Our collaborations with partners like NVIDIA, Supermicro, and Arm bring cutting-edge technologies that redefine what’s possible in enterprise AI and HPC workloads. From our revolutionary storage solution for the NVIDIA Grace CPU Superchip to the versatile WARRP architecture, WEKA is focused on delivering scalable, efficient, and energy-conscious platforms that empower organizations to take AI from proof-of-concept to full-scale production. These solutions address the demands of today’s workloads while being future-ready for evolving needs. Through reduced power consumption, optimized data handling, and cloud-agnostic flexibility, WEKA equips enterprises with robust tools that accelerate time-to-insight, maximize resource utilization, and help achieve sustainability goals. As we move forward, we’re committed to supporting the industry with adaptive architectures that can seamlessly incorporate emerging technologies, ensuring that enterprises stay ahead in an increasingly data-driven world.

Explore the WARRP Reference Architecture

WEKA DATA PLATFORM

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US

Driving the Future of AI and HPC: WEKA at SC24

Accelerating AI with Unmatched Efficiency: The First Storage Solution for NVIDIA Grace CPU Superchips

Introducing WARRP: The Cloud-Agnostic AI RAG Reference Platform for Scalable, Sustainable Production Environments

Popular Blogs From Colin Gallagher

Driving the Future of AI and HPC: WEKA at SC24

Accelerating AI with Unmatched Efficiency: The First Storage Solution for NVIDIA Grace CPU Superchips

Introducing WARRP: The Cloud-Agnostic AI RAG Reference Platform for Scalable, Sustainable Production Environments

Share On Social:

Popular Blogs From Colin Gallagher

Related Assets

Turbocharge AI Workloads with an AI-Native Data Platform

The Challenges with Modern HPC

Five Key Questions to Supercharge a Winning AI Strategy