AWS re:invent 2024 is in the rearview mirror. But the future of AI in the cloud has never been brighter. Here are a few observations from WEKA’s time spent at AWS re:Invent.

Transforming the Cloud Data Center for AI

There’s no doubt that data centers are being completely rebuilt to support both current and future waves of AI innovation. You could make a drinking game out of tech executives making claims about the need to completely rebuild the data centers and networks to support next-generation infrastructures, including accelerated compute, AI training on a massive scale, globally distributed inference, and AI-powered applications. They’re right. At WEKA, we’re seeing it firsthand and working with virtually every major infrastructure and AI provider to accelerate this build-out. As Liran points out, it’s literally why our company was founded almost 10 years ago – to create a single, unified data solution that could address the increasingly accelerated compute and data processing demands organizations faced with the rise of artificial intelligence (AI) and GPU-accelerated high-performance computing (HPC) and workloads.

While it’s no surprise to us, I noticed a lot of raised eyebrows and a-ha moments during the reInvent sessions covering data center innovation. The big moments: AWS custom silicon – which actually started in 2012 and now consists of the Nitro System, Graviton, Tranium, and Inferentia chips and custom servers, networks, and data storage gear everywhere. The big reveal for me here is not the fact of custom silicon (every provider now ticks the box on “we have our own chip designs based on ARM”); it’s the level of detail and thought going into these new designs for things like more efficient power delivery to each individual Tranium 2 chip, or the networking interconnects in the Tranium 2 UltraServers, and the Project Rainier supercomputer project (being built to train the next generation of Anthropic Claude models). For those who want to dive deep into this, see Peter DeSantis’s excellent keynote; it’s like a computer science course on how to build AI-accelerated supercomputers.

I think that last point is one to watch from AWS if you look under the covers of the Tranium 2 server itself (which was awesome to see live in the AWS Expo Hall, by the way). The stats are impressive (20.8 PFLOPS of compute, 1.5 TB high bandwidth memory), and when you look at the chips, high-density on-board networking systems (like NeuronLink) designed to eliminate cabling and increase bandwidth, power delivery, liquid cooling systems, and rack design you see all the elements of the kind of data center transformations happening everywhere. While not on the same level as NVIDIA Blackwell Superchips, you can see a lot of iteration along similar lines. I’m excited to see where this goes next.

Again, it’s no surprise for us at WEKA; we’re seeing it in every major AI provider. I would just highlight what you don’t see much mention of – meaningful improvements in the underlying data storage systems to enable all the advances in compute and networking. That’s where WEKA comes in – and we’re already working with AWS on things like WEKA for SageMaker HyperPod (see more below) as well as NVIDIA SuperPod certification, and the WEKA AI RAG Reference Platform (WARRP). WEKA provides that data firehose that can feed these next-generation supercomputers, and we’re already working with the industry leaders on their data center transformations.

Offer Building Blocks and Let Builders Build

The entire stack of compute (EC2), storage (S3), networking, and database (Aurora/DynamoDB) building blocks all got big refreshes to support the current wave of AI model development and the coming waves of multi-model, agentic, API-driven, and embedded AI applications. 

The most exciting launch for our team was the big push around Amazon SageMaker with lots of new capabilities for SageMaker AI specifically. This is particularly exciting at WEKA, given the traction we’re seeing with WEKA integration with SageMaker HyperPod. Just to recall, SageMaker HyperPod provides a highly resilient, standard architecture to do distributed model training at a massive scale. Andy Jassy does a nice job explaining that most AI customers in AWS are using SageMaker HyperPod for distributed model training. WEKA for HyperPod enables is a big acceleration in wall clock times, faster model checkpoints, and much greater utilization of the GPUs within the HyperPod cluster. Most SageMaker HyperPod customers who have tried WEKA are seeing a ton of value in the integration. For example, Stability AI is a very early HyerPod adopter that uses WEKA to drive high-performance data operations into the training environment. Once WEKA was deployed for stability, they were able to accelerate wall clock times by 35%, increase GPU utilization from 30-35% prior to WEKA to above 90%, and increase developer productivity with simplified data operations.

From a compute perspective, the main message coming out of Amazon was: “The cost of compute really matters, and customers are hungry for more options and better price performance. So the launches here really backed that up with new EC2 P6 instances with NVIDIA Blackwell chips coming next year, the GA of Tranium2 instances, offering 40% better price performance than previous generation instances. The new Tranium2 Ultra Servers are going to be fascinating to see how they’re adopted. There are some really interesting innovations in the UltraServers including NeuronCores and NeuronLink low latency connectivity between the chips in and instance and between instances, with scaling up to 64 chips in a TRN2 Ultraserver all acting together to support a training or inference workload.

I’m also pretty excited about several other new GPU and storage-optimized instances like P5en l8g; I7ie, which will be really useful for WEKA customers deploying to AWS. This innovation is awesome for WEKA customers deploying to AWS as we’ll be able to offer more options for the flash-based performance tier in WEKA clusters. This will mean even faster performance, and greater scale than ever before (stay tuned on this front).

If you found this insightful, helpful, entertaining or anything else, let me know! And stay tuned for part 2.