Six Five on the Road Featuring CTO Shimon Ben David

WEKA Chief Technology Officer Shimon Ben David joins Six Five On the Road with hosts Dave Nicholson and Alastair Cooke for a conversation on how WEKA’s solutions are designed to keep storage technology in pace with the rapidly evolving demands of today’s AI workloads.

View Transcript

Alastair Cooke: Welcome back to Six Five On the Road. This session, the series is sponsored by Solidigm. And joining me here for this session is my esteemed colleague, Dave Nicholson. Dave, it’s great to be on here with you. It’s always nice to have another face to be on the sessions.

Dave Nicholson: Absolutely. Good to be here with you, Alastair.

Alastair Cooke: And of course we do have another face joining us today.

Dave Nicholson: We certainly do.

Alastair Cooke: Shimon, you are with Weka. You are the CTO. So Shimon Ben-David, Weka’s made quite a lot of waves around providing high-performance, software-defined storage, and is on a mission to eliminate spinning disk from any use of useful data.

Dave Nicholson: I think the word was eradicate.

Shimon Ben-David: Eradicate.

Dave Nicholson: Not just eliminate.

Alastair Cooke: So why eradicate spinning disk? We haven’t been able to eradicate tape yet. Why eradicate spinning disk?

Shimon Ben-David: So I think if we’re looking at new workloads, especially coming out with AI and GenAI workloads, there’s massive amounts of data that are being utilized, whether for training or inferencing, all throughout multiple organizations. So data has been accumulated and computed on. So if we’re looking at a lot of the implications of current spinning media versus newer flash media, like Solidigms, like QLCs that are being onboarded, there’s massive value for cheaper, at scale, especially with data reduction technologies like we’re implementing, and also power utilization, better power utilization at scale, for these large AI projects, frankly for large, newer compute environments.

Dave Nicholson: So when you talk about the Weka platform, what does that look like in terms of physical infrastructure? So let’s think in terms of what we loosely refer to as a server, something that is bounded by multiple sheet metal walls, that’s got some amount of CPU, networking and memory. In this case, from a storage perspective, solid state like Solidigm devices, what does that look like for AI compared to more traditional workloads? How does it change from Weka’s perspective?

Shimon Ben-David: So if we’re looking at the compute side, the server side, usually with some sort of accelerator, a GPU, IPU, or could be multiple different accelerators, honestly, what that did, it shrunk the footprint of data ingestion from, in the past you had HPC centers that had thousands of servers, each of them processing a trickle of data. Now that was shrunk and now a single server is equivalent to 10 racks of previous environments. So suddenly that, as you call a server, a sheet of metal with CPUs and GPUs and thousands of cores, is now able to ingest terabytes of data per day and get to a meaningful output. So the implication is, we need to feed these servers with that massive amount of data at a high performance in order to make sure that the GPUs or the accelerators are being utilized, so I can get to an optimal time to my business value, to my outcome, whether that’s training a new model, whether that’s inferencing on data that I already have.

So that ability to compute on more data faster now creates new challenges. And first challenge, by the way, is on the networking. You take that massive compute power, and how do you make sure that you’re not feeding it with a straw? So if you look at previous storage environments or previous protocols, just on the networking, even if you had hundreds of gigabytes per second, the protocols themselves were like spoon-feeding the GPUs. And as a result, you see the GPUs are loaded at 30% utilization out of the 100% of capabilities. So then with Weka, that’s what we set out to do, to make sure that we are creating a storage environment that can go through the network that is constantly increasing, 100 gig, 200 gig, 400, 800 and more coming up with Ethernet and InfiniBand, so to feed the GPUs with that massive amount of data.

To do that, actually, we created an environment that is based only on NVMe’s, and that’s a key design consideration. We threw away all of the current know-hows on how to design a parallel distributed environment, and we just re-architected everything. We were out here for 10 years, so it took us some time. We’re actually already in some massive projects already. So it took some time and it’s been validated in the field. And by completely re-architecting everything for NVMe’s, we started with TLCs, now we’re able to also use QLCs for the large-capacities environments. That’s a complete transformation in how these GPUs can now efficiently get the data. As a result, we’re seeing GPU utilization go from 30%, 35% to 85, 90% and more of it. And the customer value is, their job just gets faster, and honestly, in the same amount of time, they can do much more. And yeah, I’ll stop there.

Alastair Cooke: I wanted to drill a little into that transition from TLC to QLC, because every time there is a higher density solid-state storage comes out, NAND storage, we get the suggestion that it’s going to be less reliable, we’re not going to be able to get the transactional rates through it. It’s just going to be only good for cheap and deep. And yet the reality we are seeing is it is definitely deep storage, but behind an NVMe channel, that RAID performance is still spectacular. And what I see is that’s driving that, again, higher density, lower power utilization, the whole density of things in data centers is getting larger along with it.

Shimon Ben-David: Yeah, I completely agree. By the way, there’s always the thoughts of, maybe there’s a new type of media and that media is a one-trick pony. I can only use it to back up data, maybe to do fast restores, don’t utilize it too much. I think there’s some truth in that it’s not a one trick pony as much as people are trying to use new media in the ways that they did before, and you need to accommodate for that. So if you’re just trying to blast a small amount of NVMEs, even TLCs or QLCs, with massive write IOPS in the current traditional way storage environments worked, that will create a problem. The way we solved it is by distributing our workloads in an even fashion across hundreds or thousands of these TLCs and QCs. So now when we are getting a workload, everything is parallelized in terms of metadata, and then everything is again, parallelized in terms of data.

So if we’re looking at small-scale projects, 100 terabytes, it doesn’t make sense to go to something that is a small amount of NVMe’s. But if we’re looking at the newer projects, and these are the interesting projects, hundreds of petabytes, hundreds of terabytes, hundreds of petabytes going to the exabytes already. If you put that massive amounts of QLCs out there, and then we use our code to just parallelize across them, and then we don’t hotspot any of them, so we guarantee actually utilizing all of them in equal fashion. So then you are actually, by doing this new way of working with them, you actually are able to get massive performance out of them because you are parallelizing reads and writes and small IOPS across all of them, and you make sure that their wear and tear level is equal and controlled.

Dave Nicholson: Let’s deconstruct this a little bit, if you don’t mind. So we’re talking about TLC versus QLC, so the underlying NAND. Then there’s the form factor and the interface, and it’s interesting and exciting to talk about the frontiers of AI and NVMe and what that looks like. But let’s take a little step back in history, at least leading-edge history, to the here and now. We still have SAS and SATA interface devices. We still have these things called RAID controllers sometimes in that environment, hardware RAID controllers nonetheless. And so can you walk us through so that the audience has a big-picture understanding of the transitions we’re going through there? Because in the not too distant past and present, we have SAS, SATA-interfaced, call them SSDs, certain form factor, hot swappable, fantastic. They’ve got certain characteristics associated with them. You might have a hardware RAID controller in front of those to deliver caching, or-

Shimon Ben-David: We do not, people do.

Dave Nicholson: You do not, people do. Exactly. This is my point. This my point. So I’m saying, “Okay,” I’m saying, “Shimon, my units of scale are these servers with SSDs in them, SATA, SAS devices, and I’m going to deploy 100 of these. Weka, what are you going to do for me?” What are you going to tell me? What would you counsel me in that regard?

Shimon Ben-David: So first of all, the way we look at it, when you’re utilizing some form of specific hardware component, you’re tied in. You’re tied in a way that, either to that RAID controller to a vendor, and honestly you’re tied in to your locality. You cannot go to other environments, a cloud, or your own private cloud, that do not have this. So it really limits you. And more than that, if you’re looking at the value, in the past, this hardware component were essential to accelerate your workload. I had to have a smart RAID controller so then I can RAID across multiple devices, and I would offload it from the CPU to that RAID controller. What we’re seeing now is that there already are methods of doing it in a very efficient way in software, in code. You just need to work harder.

Dave Nicholson: Despite the fact that that’s chewing up some cycles from other processors that are not on board a dedicated piece of hardware?

Shimon Ben-David: Exactly. If you want to scale up and down, a RAID controller is fantastic. If you want to scale out, then suddenly, a RAID controller is problematic. I’ll also say, one of our other analyses is, it introduces another layer that you don’t control. So if now I’m writing IOs to a system, and it goes into servers and these servers have RAID controllers, and these RAID controllers are essentially a middleman between my IOs to my flash devices, I didn’t write these RAID controllers, I didn’t write the logic, I don’t know what’s going on in them. So suddenly when something happened, there’s another layer that I need to tackle. If you look at, for example, what we are doing, where everything is in code, from the moment where the data leaves the compute client or is actually going to the Weka client or the compute client, up to the moment that it’s spread out, shared, distributed on all of the NVMe’s, we control all of the IO path and all of the decisions. So it really allows us to, even, by not utilizing a hardware component that we don’t control, it allows us to actually take the smarter and more efficient decision.

I’ll give you an example. Let’s say that I have 100 servers, to your example, and I don’t have these RAID controllers. We’re able to look at all of the NVMe’s and down to the queue depths per NVMe and take smart decisions where maybe I’ve pre-prepared stripes, and now I see that some of the NVMe’s are maybe a millisecond slower than others. Maybe I’ll take a decision to not work right now with them. Maybe I’ll take two or three milliseconds to let their queue depths decrease while I work on the other NVMe’s. I wouldn’t be able to do that if now I have another hardware abstraction layer that I don’t control.

Plus, if I want now to take the same code, Weka code, for example, and run it on cloud environments and run it on other private cloud environments, nobody guarantees that these hardware components, and we talk about RAID, but there’s also NVRAMs, NVDIMMs, storage class memory, all of these hardware components are actually, to us, they’re a shortcut that people had to take in the past, or chose to take in the past, that is not necessary anymore.

Dave Nicholson: So Alastair, if you’ll indulge me, I want to just follow up a little bit on this. So let’s take that 100-node example, and I refer to them as units of scale, for lack of a better term. Is it appropriate to think of those units of scale as potentially having GPU(s) in them, processing power, along with storage devices? Yes or no?

Shimon Ben-David: Yes.

Dave Nicholson: Yes. You would?

Shimon Ben-David: Yes.

Dave Nicholson: Okay.

Shimon Ben-David: We can talk about it more.

Dave Nicholson: Okay, so, because this is what I want to know if I’m a practitioner, I have my 100 units of scale. I wake up in the morning having wonderful dreams of GPUs. I don’t necessarily dream of storage devices. It’s a reality. It’s a reality.

Shimon Ben-David: It is.

Dave Nicholson: I don’t like that, but it’s the truth. So people are thinking of those GPUs, and one thing they want to know for sure is, if one of my discrete storage devices goes down, does Weka take that unit of scale, that node offline? That expensive GPU that’s doing all that work, does it go down when a single device goes down? How do you manage that?

Alastair Cooke: Yeah. Before you answer, I just got to point out that you got one minute.

Dave Nicholson: Yeah.

Shimon Ben-David: Okay. So-

Dave Nicholson: Well, we’ll go longer. We’ll go longer.

Shimon Ben-David: So traditionally, shared storage has been deployed as an appliance that is external to the environment. I have my compute and then I have my storage. So in that example, people will not utilize the shared storage environment on the GPU servers. We were able to converge the storage in the compute concurrently on the same servers in a safe fashion, and now we are able to distribute and create a shared in data and protected environment. If some of these NVMe’s, for example, will fail, we’ll just fail them. The compute will still work. Even if all of the NVMe’s on our compute-

Dave Nicholson: Just the devices, not the whole node, not the whole node? Okay.

Shimon Ben-David: Yeah. We will do it in a safe fashion that we will only control the storage layer. Usually, what realistically happens is that the compute fails and then it fails the storage part, and then we’ll also rebuild around that.

Dave Nicholson: Okay. Makes sense.

Alastair Cooke: So it’s a distributed situation across the entire cluster.

Shimon Ben-David: It’s a no-footprint storage. It’s a converged storage with the GPU. You’re getting the performance and capabilities of a high-performance shared environment without adding another box.

Alastair Cooke: Now, I want to go along as well because I’m really interested in the implications of the data locality towards those GPUs that are processing.

Shimon Ben-David: No data locality.

Alastair Cooke: Okay, that’s interesting.

Dave Nicholson: Is that okay? Is that okay because of-

Shimon Ben-David: It’s more than, okay. When you think about data locality, data locality is a compromise. Data locality was created when you had small network pipes and you had to optimize to hard drives. And again, we’re going back to why hard drives made sense in the past, doesn’t make sense anymore. Now, when you look at network pipes that are faster than the memory speed and the CPUs themselves, now we are able to say we’re spreading everything across the entire distributed environment in a way that working with a Weka mount point is faster than working with your local NVMe’s.

Dave Nicholson: Okay, so you’re saying assume fast enough networking.

Shimon Ben-David: Assume at least 100-gig networking.

Dave Nicholson: Fair enough, fair enough.

Alastair Cooke: I think design for is the important thing.

Shimon Ben-David: Design for. Yeah, designed for. Thank you.

Dave Nicholson: Fair enough.

Alastair Cooke: And to avoid getting onto the three-hour-long podcast that Dave would like-

Dave Nicholson: Oh, don’t avoid it.

Alastair Cooke: Shimon, clearly we haven’t been able to go nearly deep enough, wide enough, or high enough in this topic. Where’s a great place for the viewers here at Six Five On the Road to find more about Weka?

Shimon Ben-David: Just Google Weka, WEKApod, we have the new WEKApod appliance that we announced for SuperPOD certification, Weka, NVIDIA, SuperPOD. We are out there on-prem, on-cloud,

Alastair Cooke: Everywhere to be consumed.

Shimon Ben-David: Everywhere to be consumed, yeah. And honestly, that’s where I would look for Weka.

Alastair Cooke: Well, thank you, Shimon from Weka, and thank you for joining us. Thanks, David Nicholson for joining me, Alastair Cooke, here on Six Five Media On the Road. There’s plenty of great content from this Solidigm series, so stay tuned to us wherever you’re consuming.

Additional Videos

AI Inference, Agent Swarms, and Token Economics | Val Bercovici at VentureBeat AI Impact Tour

Video

AI Inference, Agent Swarms, and Token Economics | Val Bercovici at VentureBeat AI Impact Tour

Video

It’s Time to Put Your Data in the Fast Lane

Video

AI Economics Explained: How to Optimize Costs, GPU Utilization, and Performance at Scale

PRODUCTS

DEPLOYMENT OPTIONS

USE CASES

INDUSTRIES

ARCHITECTURES

Learn AI

RESOURCES

TECHNICAL RESOURCES

ABOUT US

JOIN US

Six Five on the Road Featuring CTO Shimon Ben David

Additional Videos

Six Five on the Road Featuring CTO Shimon Ben David

Share On Social:

Additional Videos