Hot Take: Will Data Egress Spark a New Cloud Pricing War?
When Google Cloud announced they would waive egress fees for customers who wished to exit their cloud, it looked like either a helpful update for customers, a PR stunt, a way to get ahead of emerging EU data rules, or all three, depending on your point of view. Now that AWS has followed suit, it seems inevitable that Microsoft will follow in short order. This will be welcome news for many customers who now have greater flexibility in their choice of cloud provider with one less lock-in fee to deal with. Overwhelmingly, customers are looking for transparency and predictability when it comes to the cost of cloud services.
It’s worth pointing out that data egress fees are just one form of data transfer, one of the most notorious hidden costs in the cloud. Part of the challenge is that egress data transfer fees are a charge for network traffic, and until users are actively using data they are difficult to predict. Charges can also be levied for moving data from one zone to another, one region to another, or even from one application to another without ever leaving the cloud provider’s network. Most customers overlook these costs until after the fact and can find they mount up quickly – what’s more, companies are often unaware of them until they receive their end-of-month invoice.
With the emergence of AI, challenges around data transfer costs will only multiply. Data sets for model training and tuning are growing exponentially, and the performance demands of AI applications mean data needs to be closely coupled with compute. As a result, organizations building AI applications find they have high data transfer requirements as they move data between performance and capacity storage tiers, with each step potentially incurring another data transfer fee. However, because basic data transfer costs lack transparency, customers are often surprised when data transfer shows up as a major cost driver on their cloud bill. The high cost associated with actually moving data out of a cloud environment further erodes trust in the intentions of the cloud provider. It’s no wonder egress costs are often associated with keeping customers “captive” to use a single cloud provider and its associated services.
It’s fantastic news that the cloud providers are starting to line up and offer to waive the data egress charges if a customer wants to leave their cloud entirely. Since data egress is only one piece of the data transfer puzzle, here are some ways you can mitigate other data transfer fees in your environment, and how WEKA can help.
3 Strategies for Navigating Data Transfer Policies
1. Run Compute in the Cloud….but Keep Your Data
The guaranteed way to avoid all data egress costs is to never retain your data in the cloud to begin with. Data-intensive organizations with a hybrid cloud approach can benefit most from this strategy. When you collect data in your on-prem data lake, you can treat cloud resources as fully ephemeral (temporary). How it works: spin up the cloud computing resources you need to run your analysis and transfer the data set into the cloud. Once your analysis is complete, keep only the results of the analysis, delete everything else, and spin the entire analysis environment back down. Data transferred into the cloud can be treated as ephemeral and only the results need to be brought back on-premises, minimizing network transfers. This allows companies the flexibility to choose the best cloud platform for their specific needs without the all-or-none.
2. Use Incremental Remote Backups
Remote backups are a critical portion of any company’s resilience and continuity strategy. However, most cloud providers lack native options for generating recovery-ready copies to a second availability zone or region, let alone to a second cloud provider or on-prem data center. The WEKA® Data Platform provides Snap-to-Object (S2O) capability to attach multiple S3 object stores. One can be local as part of the data global namespace while the other can be a remote bucket – on-premises, in a separate region within the same cloud vendor or in an entirely different cloud. The WEKA-created DR copy can operate as a fully functional remote data platform. WEKA’s snapshot capability supports incremental snapshots so only the changes made since the prior snapshot are transferred, thus minimizing egress costs from one cloud vendor to another.
3. Move Fewer Bits
Data egress charges are calculated based on the amount of physical data moved on the network. WEKA offers two capabilities that minimize the amount of data moved across or out of the cloud network. First, WEKA supports up to 1024 filesystems in a single namespace allowing administrators to limit remote snapshots to a subset of the entire data set. Snapshots are associated with a filesystem, allowing you to leverage the capabilities described above to move a limited data set. This capability is also very effective for customers with a hybrid cloud strategy, as individual filesystems can be snapshotted to the cloud while the majority of data remains on-premises. Second, minimizing the physical footprint of the data store will also serve to reduce overall costs including storage and network transfer. The WEKA Data Platform supports a cluster-wide data reduction feature that can be activated for individual filesystems. This capability incorporates block-variable differential compression and advanced de-duplication techniques across the filesystems, significantly reducing the required storage capacity for user data and delivering substantial cost savings.
The Takeaway
Companies are looking for better predictability in cloud costs to eliminate surprise bills and will see these updates as welcome changes. While WEKA can’t eliminate data transfer costs entirely – only the cloud vendors can do that – we can help enterprises enhance their multicloud or hybrid cloud strategies to ensure greater data liquidity and in turn greater predictability in cloud costs.