The NVIDIA GB200 (Blackwell) platform is redefining AI compute with incredible speed, scale, and efficiency. Designed to handle the massive demands of next-generation AI workloads, NVIDIA Blackwell delivers groundbreaking performance in reasoning models, AI agents, and token generation. With its advanced architecture, Blackwell GPUs provide ultra-high-speed communication, immense memory bandwidth, and unparalleled compute power—critical elements for real-time AI decision-making. As AI shifts beyond mere training to sophisticated inference and reasoning, infrastructure must evolve to keep pace.

However, raw GPU performance alone isn’t enough. To fully unlock the potential of AI reasoning, cloud providers need an equally high-performance data infrastructure that eliminates bottlenecks and maximizes utilization. That’s why WEKA has achieved certification as a high-performance data store for NVIDIA GB200 deployments, supporting NVIDIA Cloud Partners (NCP). For NCPs building AI clouds, GPU-as-a-service, or other neo cloud offerings, this certification ensures they can now deliver the fastest, most scalable data infrastructure available. This builds on our recent certification for NVIDIA HGX H100/H200 systems to deliver more value to our long-standing partnerships with leading neocloud providers like Yotta, Ori Cloud, Sustainable Metal Cloud, and many others.

Why This Matters: AI Isn’t Just Faster—It’s Fundamentally Different

The AI landscape has shifted. With the rise of reasoning models and AI agents, workloads are becoming more complex, requiring an unprecedented combination of high-speed communication, memory, and compute power. AI reasoning isn’t just about crunching numbers—it’s about generating and processing vast amounts of tokens in real time, demanding an infrastructure that can keep up with the GPUs.

Neo Cloud providers have stepped up, going beyond simple GPU-by-the-hour rentals to offer fully optimized AI cloud services. But legacy data storage is holding them back. Here’s why:

  • Performance Bottlenecks – Traditional storage can’t keep up with the I/O demands of modern AI workloads, High latency between the compute and data infrastructure leads to GPU underutilization.
  • Inefficient Scaling – Many providers are forced to over-provision storage just to meet performance targets, driving up costs.
  • Weak Multi-tenancy – Legacy storage lacks robust isolation, forcing providers to create inefficient storage silos per customer.
  • High Complexity and Cost – Traditional resilience models rely on cumbersome replication strategies that inflate expenses.

WEKA and NVIDIA GB200: Built for the AI Era

With WEKA now certified for NVIDIA GB200 deployments, NCPs can supercharge their AI cloud offerings with:

  • Unmatched Performance – WEKA’s zero-tuning architecture optimizes dynamically for any workload, delivering sub-millisecond latency and millions of IOPS. A single 8U entry configuration meets the extreme I/O demands of a GB200 Blackwell scalable unit (1,152 GPUs).
  • S3 Object Storage Optimized for AI Pipelines – WEKA’s S3 interface delivers ultra-low latency and high throughput, optimizing small object access for AI, ML, and analytics workloads.
  • Maximized GPU Utilization – Storage bottlenecks kill AI performance. WEKA eliminates them, typically improving data performance by 10x or more. Customers have seen GPU utilization soar from 30-40% to over 90% in real-world deployments.
  • True Multitenancy – WEKA’s composable clusters create logical and physical isolation leveraging the inherent capability of containers, allowing secure, high-performance, multi-tenant AI cloud services—without compromise.
  • Massive Scale – WEKA supports up to 32,000 NVIDIA GPUs in a single namespace, allowing NCPs to scale globally without architectural limitations.
  • Seamless Migrations – Whether in data centers, hyperscale clouds, or neo clouds, the same WEKA software runs everywhere, making workload migration effortless.

Benchmarking and Real-World Performance

WEKApod appliances deliver incredible performance density and power efficiency to NVIDIA Cloud Partner deployments.

WEKApod Nitro Appliance

  • Throughput Performance – Each WEKApod node achieves 70GB/s read (560GB/sec per minimum configuration) and 40GB/s write throughput (320GB/sec per minimum configuration), ensuring that BlackwellGPUs are constantly fed with high-speed data to maximize utilization.
  • Latency Optimization – With sub-millisecond latency, WEKA ensures minimal delays in AI training and inference workloads, enabling real-time reasoning AI models.
  • Scalability in Action – NCPs leveraging WEKApod have scaled from petabytes to exabytes without compromising performance, supporting thousands of concurrent workloads.
  • GPU Utilization Gains – WEKApod customers running large-scale AI workloads have reported 90%+ GPU utilization rates, compared to traditional storage solutions, which often leave GPUs idle due to I/O bottlenecks.
  • Energy Efficiency – WEKApod’s optimized data handling reduces power consumption per AI workload, lowering overall operational costs for AI cloud providers.
  • NVIDIA Certified – WEKA’s inclusion in the NVIDIA Certified Systems Storage program ensures high-performance, scalable, and reliable storage solutions optimized for NVIDIA’s AI and data analytics workloads.

By leveraging WEKApod, cloud providers can remove storage as a bottleneck, ensuring that their Blackwell GPUs operate at full efficiency without requiring excessive over-provisioning.

Proper storage sizing is critical to ensuring optimal AI training and inference performance. The storage performance targets vary based on the model type, dataset size, and workload characteristics. To support high-performance training and inference on NVIDIA MGX systems, the WEKA Data Platform provides scalable, high-throughput storage that meets the needs of modern AI workloads.

For large-scale AI training, read and write performance plays a crucial role—particularly for checkpointing, which is a synchronous operation that can stall training if not properly optimized. Large Language Models (LLMs) require substantial write throughput during checkpointing, with estimated write rates scaling based on model size. For example, a 530-billion parameter model may require a total write rate of 206 GB/s, while a 1-trillion parameter model could require nearly 389 GB/s of aggregate write performance.

The table below highlights the storage density and certified performance of WEKApod Nitro appliances aligned to the Enhanced Guidance/Performance requirement for NVIDIA GB200 NVL72 Racks with a minimum storage capacity of 10,924TB:

SU Groups GPUs WEKApod based on Standard WEKApod based on Enhanced
1 1152 8 9
2 2304 8 17
4 4608 11 33
8 9216 22 66
16 18434 44 132

By leveraging WEKApod, cloud providers can remove storage as a bottleneck, ensuring that their Blackwell GPUs operate at full efficiency without requiring excessive over-provisioning.

The Future of AI Infrastructure is Here

With WEKA and NVIDIA GB200 NVL72, AI cloud providers no longer need to compromise on performance, scalability, or security. The era of AI reasoning demands a new kind of data infrastructure—one that’s fast, efficient, and built to handle the explosive growth of the token economy.

If you’re an NVIDIA Cloud Partner building the next-generation AI cloud, let’s talk. It’s time to unshackle your GPUs and unleash the full potential of AI.

See How WEKA Powers Neoclouds