Hot Take: How NVIDIA Just Changed the Game for AI in the Cloud
While the tech world spent this week marveling at the engineering behind the new Blackwell system, NVIDIA rolled out another innovation that could prove even more transformative for our industry – NVIDIA Inference Microservices (NIMs).
Imagine taking your preferred foundation model, putting it in a docker container, training it on your data, and putting that in a CI/CD pipeline—that’s the promise inside NIMs. If broadly adopted, this approach will do two things.
First, combined with the increased horsepower available in Blackwell, will enable enterprises to collapse the time to market for AI applications from months to hours.
Second, enterprises will find they have far more flexibility and control over where they train their AI models and how those trained models get deployed.
Today’s Enterprise AI Model Training: Go Fast or Go Controlled
Enterprises incorporating AI into their applications today have two choices. First, they can leverage a managed AI service, such as Microsoft Co-Pilot, Amazon Q, or OpenAI. With easy-to-use APIs, this is a great option for enterprises looking to get going with their AI strategy with a proof of concept, a completely new application, or to show a quick win to the board. However, this approach gives away control over most aspects central to their mission. In the managed AI model, everything is shared outside the organization (and possibly publicly), including the data used for training, tuning, and inference; the tuning or customizations in the model to fit the organization’s specific needs; even the choice of what infrastructure to support the application. These are all decisions that most enterprises lose control of when they select a managed AI deployment model.
Alternatively, many enterprises today build bespoke AI models in-house using open-source models like Claude 3 or LLama 2. This approach gives enterprises the full control they seek regarding data, model development, and deployment options. However, it is extremely complex and time-consuming. The organization has to procure, build, and manage infrastructure to train the models. It also has to hire AI experts who know how to train the models on proprietary data and then deploy them to production.
During GTC, the NVIDIA folks described NIMs as the “best of both worlds”—the simplicity and speed of the managed AI approach combined with full control of the model, data, and deployment that most enterprises prefer in the bespoke approach. But with the right infrastructure underpinnings, there’s a lot more to it than that. The containerization in NIMs means AI models could be incorporated into a CI/CD pipeline, which would be another step change acceleration in time to market for AI applications themselves.
Enabling Speed, and Agility for Enterprise AI
Today, the complexity of rolling out trained AI models into an actual application is slowing down time to market. One estimate put training the Llama 2 -70B model from scratch at 1.7 Million GPU hours. That would take a single NVIDIA H100 GPU 194 years to complete model training, or a 1000 node GPU cluster about 2.3 months. Then it takes another 4 to 6 months for an enterprise to incorporate that model into their application. So actual enterprise adoption of AI will hinge just as much on the ability to incorporate AI models, trained on proprietary datasets securely and safely, into applications – and that means CI/CD.
A popular example of this is in healthcare. A drug manufacturer could bring any open-source foundation model they prefer in-house and train it on their proprietary data set. The resulting custom AI model, delivered as a NIM, can then be deployed to whatever infrastructure makes the most sense—on-prem or in whatever cloud the organization prefers. The examples here could apply across almost any industry, including automotive, electronic design, retail, and warehouses.
This is where the containerized approach in NIMs could be really valuable for enterprises looking to incorporate fully-trained models into their applications. It’ll also likely enable a decentralized development model where a focused foundation AI team can focus on building and training next-generation models using the company’s proprietary data and internal algorithms. Separately, application teams would be able to pull the latest, fully validated, vetted, and approved models and incorporate them directly into their development pipeline. This model should take the many months of AI training cycles that are common today and shrink that window to days or even hours.
Flexible Deployment Models Requires Flexible Data
The implications of AI model containerization via NIMs extend to the cloud providers themselves. You can deploy your NIMs anywhere you want – to your data center or any DGX Cloud – and there is now a long list of cloud providers that offer or plan to support DGX, including all the major cloud providers and many of the newly minted GPU cloud providers like Applied Digital and Yotta. So the availability of DXG to enterprises in any deployment model, along with the flexibility enabled by NIMs, gives enterprises a level of flexibility and control they did not previously have.
The one gap here concerns the flexibility of the data itself. To build off Salesforce and Matthew Mcconaughey, if data is the new gold, where will you mine it? Put another way, are you going to bring your data to the model? Or are you going to bring your model to the data? Given concerns around AI safety and data privacy, it’s likely that most enterprises will choose the latter, or at least go with the former only with extreme caution. Enterprises would be smart to think about bringing this same level of deployment flexibility to data that NIMs promise to bring to AI applications – build it anywhere, deploy it anywhere.
It turns out, WEKA knows a few things about infrastructure flexibility having built an AI-native data platform entirely in software that deploys to any cloud infrastructure, most of the new GPU cloud offerings, and any commodity on-premises infrastructure using the same software. Further, with WEKA Snap to Object, enterprises can shift their deployment models at any time. So with WEKA and the NVIDIA stack of services and tools of AI application development, enterprises have the ultimate flexibility to build and run AI applications where it makes sense for their business.