Recently while surfing You Tube, I came across an old advertisement where Ella Fitzgerald shattered a wine glass with her voice. The ad’s tag line was Is it live or is it Memorex? It got me thinking about how vendors and customers view benchmarks – usually with great suspicion. The common view is that benchmarks are poor imitations of reality and easily manipulated. While this can be the case, benchmarks are an important tool meant to help make infrastructure comparisons easier and validate vendor claims. Yes, any tool can be misused.
The Standard Performance Evaluation Corp. (SPEC®) has leveled the vendor playing field by enforcing standard testing and results submission processes that minimize the possibility of gaming the system. They’ve used IO traces of actual applications to derive realistic test workloads for their test suites, providing a means for apples-to-apples comparisons. Currently, they offer four different application simulations – VDI, database, software compiling, and EDA. Still, the only true way to evaluate performance is to engage in proof of concept (POC) testing in your environment with your applications.
In the enterprise, POC testing is common and usually straight forward. However, in research labs and supercomputing centers, POCs are generally not possible. Benchmarks are the only way to validate that these systems meet RFP specs. Many HPC benchmarks exist, however, these focus on infrastructure performance, not application performance, which is the only performance metric that matters to users.
The Department of Energy (DOE) held a benchmark workshop at Lawrence Berkeley Labs. Thought leaders attended from NERSC, Oak Ridge, Argonne, Los Alamos, and Lawrence Livermore national labs and TACC and SDSC supercomputing centers. The goal was to discuss and gather benchmark requirements for data-intensive science. The scale and diversity of workloads these labs support far surpasses anything in the enterprise.
Interestingly, they struggle with the same problems as enterprises, highly subjective benchmark results. New applications such as artificial intelligence (AI) and machine learning (ML) further complicate application level (end-to-end) performance testing by creating unique workloads and access patterns. In fact, there was unanimous agreement that AI and ML represent the greatest computing challenge going forward.
A few important takeaways from the event on benchmarks include:
– Understand what you are testing and how the benchmark works. Benchmarks are highly tunable and default settings are seldom ideal
– Specialized infrastructure level benchmarks are useful, especially for system tuning but they are limited in their ability to provide an accurate real-world perspective of performance
– Application level benchmarks are ideal because this is what the user experiences
– SPEC® is a good example of an application level benchmark for the enterprise
– Few benchmarking tools are available that are designed to emulate AI and ML workloads
– Nothing beats seat-of-the-pants (i.e. POC) testing with your applications on your infrastructure
WekaIO Matrix benchmark figures are publicly posted on the SPEC® website to allow IT organizations to see for themselves how we compare to other scale-out, high performance storage solutions. An important factor to note is that WekaIO’s test system was hosted in Amazon’s public cloud, so there was no possibility of optimizing hardware or tuning network traffic, making this the ultimate case of WYSIWYG. Like third-party validations, benchmarks are valuable tools, but they should only be used for baseline comparisons. Contact WekaIO today about a free POC and judge for yourself.