VMware : Achieving Near Bare-metal Performance for HPC Workloads on VMware vSphere 7

January 26, 2022 at 10:49 pm EST

By Michael Cui and Ramesh Radhakrishnan

High Performance Computing (HPC) is an area in which massive computing power is used to solve complex scientific and engineering problems. It often involves running a cluster of compute nodes to execute complex simulation code and/or processing large quantities of data. HPC is widely applied in many industries, such as drug discovery in life sciences, risk modeling in finances, electronics design and automation in the chip industry, and image rendering in the movie industry. Many VMware customers have a significant amount of HPC workloads within their organizations. In fact, these workloads are often one way they differentiate and out-innovate their competition, as HPC has a direct impact on the design and implementation of the products and services they offer.

Performance is one of the most important characteristics of HPC. To obtain the maximum performance, people tend to run their HPC workloads on dedicated hardware, which is often composed of server-grade compute nodes interconnected by high-speed networks. Although virtualization is a rising trend to modernize HPC, some professionals are still concerned that the additional software layer introduced by virtualization may impact performance.

Our team within the Office of the CTO has been focusing on supporting HPC and machine learning workloads on the VMware platform. We believe that HPC can take advantage of the various benefits of virtualization and, in some cases, the SDDC platform (for example, heterogeneity, multi-tenancy, reproducibility, and automation to name just a few) and meet performance goals.

During the past years, we have successfully run many HPC workloads, ranging from molecular dynamics to weather forecasting, with excellent performance on our virtual platform. We collaborated with several customers to virtualize their HPC footprint. For example:

In this blog, we showcase our success with running HPC applications on the latest vSphere 7 platform, and we share data for both high-throughput and tightly coupled HPC workloads.

High-Throughput Workload Performance

BioPerf is a popular benchmark suite that includes several life sciences benchmarks. These benchmarks represent throughput-oriented HPC workloads that simultaneously run on multiple cores but do not require any message passing. This class of applications also represents workloads such as Monte Carlo simulations used in finance or physics which run on a single core, but the infrastructure must process thousands of jobs per simulation.

To objectively evaluate performance, we always compare apples-to-apples between bare-metal and virtual environments using the same hardware. That is, on the same testbed we configured two operating environments:

One was an operating system installed on a bare-metal server
The other was the same operating system running inside a VM on top of the VMware ESXi hypervisor

For this study, we used RHEL 8.1 as both the bare-metal operating system and guest operating system, and the testbed was a Dell R740XD server with dual Intel Xeon Gold 6248R CPUs and 384 GB of memory.

The results we obtained for BioPerf are illustrated in Figure 1. The X-axis shows the eight benchmarks from BioPerf, and the Y-axis shows the performance ratio between the virtual and bare-metal systems. We used wall clock time to measure throughput performance, and the performance ratio was calculated as the wall clock time in bare metal divided by the time in virtual for each individual benchmark. As you can see from the figure, the performance of the BioPerf benchmarks on virtual is very close to bare metal. With such minor performance differences, users can leverage the benefits brought by virtualization with essentially no performance overhead. Customers achieve increased operational simplicity and uniformity by bringing these important workloads onto their enterprise IT infrastructures.

As in traditional bare-metal HPC systems, optimal performance requires careful tuning. During these experiments, we have applied the recommended settings from BIOS to VM sizing following the Running HPC and Machine Learning Workloads on VMware vSphere - Best Practices Guide white paper. Specifically, the applied tunings are summarized in Table 1 below.

Setting	Value
BIOS Power Profile	Performance per watt (OS)
BIOS Hyper-threading	On
BIOS Node Interleaving	Off
ESXi Power Policy	Balanced
VM sizing	Same as the physical host

Tightly Coupled HPC Workload Performance

Our team also evaluated tightly coupled HPC applications or message passing interface (MPI)-based workloads and demonstrated promising results. These applications consist of parallel processes (MPI ranks) that leverage multiple cores, and applications are architected to scale computation to multiple compute servers (or VMs) to solve the complex mathematical model or scientific simulation in a reasonable amount of time. Examples of tightly coupled HPC workloads include computational fluid dynamics (CFD) used to model airflow in automotive and airplane designs, weather research and forecasting models for predicting the weather, and reservoir simulation code for oil discovery.

Figures 2-4 show both performance scalability and difference in performance for three representative HPC applications in the CFD, weather, and science domains. All three applications demonstrate efficient speedups when computation is scaled out to multiple systems. The relative speedup for the application is plotted (the baseline is application performance on a bare-metal single node).

We ran these tests on an HPC cluster deployed on Dell R640 vSAN Ready Nodes with dual Intel Xeon Gold 6240R CPUs. The compute nodes for the bare metal cluster used all 48 cores and 256 GB available on the host. For the virtual cluster, we configured the VMs with 44 vCPU and 192 GB of memory based on resources available after VM settings were used to achieve the best performance for this class of applications.

The charts demonstrate that MPI application performance running in a virtualized infrastructure (with proper tuning and following best practices for latency-sensitive applications in a virtual environment) is close to performance in a bare-metal infrastructure.

Again, we applied performance tunings that were found to work best for MPI applications, and they are summarized in Table 2 below. To achieve the best performance for applications that leverage MPI for parallel communication that demands low-latency network performance, taking advantage of the VM Latency Sensitivity setting available in vSphere 7.0 for low-latency applications is important.

Setting	Value
BIOS Power Profile	Performance per watt (OS)
BIOS Hyper-threading	On
BIOS Node Interleaving	Off
BIOS SR-IOV	on
ESXi Power Policy	High Performance
VM Latency Sensitivity	High
VM CPU Reservation	Enabled
VM Memory Reservation	Enabled
VM Sizing	Maximum VM size with CPU/memory reservation

If you're interested in learning more about running HPC workloads on VMware, please visit our blogs at https://blogs.vmware.com/apps/hpc and https://octo.vmware.com/tag/hpc/ to read more about our previous and ongoing work.

About the Authors

Michael Cui is a Member of Technical Staff in the VMware Office of the CTO, focusing on virtualizing High Performance Computing. His expertise spans broadly across distributed systems and parallel computing. His daily work ranges from integrating various SW and HW solutions, to conducting Proof-of-Concept studies, to performance testing and tuning, and to technical paper publishing. In addition, Michael serves on Hyperion's HPC Advisory Panel and participates in paper reviewing in several international conferences and journals, such as IPCCC, TC, and TSC. Previously, he was a research assistant and part-time instructor at the University of Pittsburgh. He holds both PhD and Master's degrees in Computer Science from the University of Pittsburgh.

Ramesh Radhakrishnan is a Technical Director in the HPC/ML team within the VMware Office of the CTO. He has presented at key industry events such as NVIDIA GTC, VMworld, Dell Technologies World, and ARM TechCon, and he has 15 published patents. He received his PhD in Computer Science and Engineering from the University of Texas at Austin. Connect with him on LinkedIn.

Acknowledgments

The authors thank Rizwan Ali from Dell for the MPI performance test results.

Attachments

Original Link
Original Document
Permalink

Disclaimer

VMware Inc. published this content on 26 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 27 January 2022 03:48:04 UTC.

	1st Jan change	Capi.
ZSCALER, INC.	-19.90%	26.51B
MONDAY.COM LTD.	-0.30%	9.04B
KPIT TECHNOLOGIES LIMITED	-9.42%	4.47B
WALKME LTD.	-25.68%	738M
TRAFFIC CONTROL TECHNOLOGY CO., LTD.	+2.35%	485M
XPERI INC.	-9.44%	430M
INSYDE SOFTWARE CORP.	-9.44%	266M
ZHENGZHOU TIAMAES TECHNOLOGY CO.,LTD	-29.61%	195M
HANGZHOU HOPECHART IOT TECHNOLOGY CO.,LTD	-44.02%	175M

1st Jan change

Capi.

ZSCALER, INC.

-19.90%

26.51B

MONDAY.COM LTD.

-0.30%

9.04B

KPIT TECHNOLOGIES LIMITED

-9.42%

4.47B

WALKME LTD.

-25.68%

738M

TRAFFIC CONTROL TECHNOLOGY CO., LTD.

+2.35%

485M

XPERI INC.

-9.44%

430M

INSYDE SOFTWARE CORP.

-9.44%

266M

ZHENGZHOU TIAMAES TECHNOLOGY CO.,LTD

-29.61%

195M

HANGZHOU HOPECHART IOT TECHNOLOGY CO.,LTD

-44.02%

175M

Liquid Web Expands Business Continuity Solutions with Disaster Recovery Protection Powered by VMware	Mar. 26	CI
VIAVI Solutions Inc. and VMware, Inc. Expand RIC Testbed as a Service to Advance Digital Twin Environments	Feb. 21	CI
VMware, Inc.(NYSE:VMW) dropped from S&P Software & Services Select Industry Index	Nov. 28	CI
VMware, Inc.(NYSE:VMW) dropped from S&P TMI Index	Nov. 28	CI
VMware, Inc.(NYSE:VMW) dropped from S&P Global BMI Index	Nov. 28	CI
VMware, Inc.(NYSE:VMW) dropped from FTSE All-World Index	Nov. 27	CI
Global markets live: Vertex, Amazon, Microsoft, BP, Unilever...	Nov. 23
Broadcom Borrows $28.39 Billion to Fund VMware Acquisition	Nov. 22	MT
Global markets live: HP, Nordstrom, Deere, Nvidia, Broadcom...	Nov. 22
Trending : Broadcom Finally Seals VMware Deal	Nov. 22	DJ
Broadcom Completes $69 Billion Acquisition of VMware	Nov. 22	MT
Broadcom Closes VMware Acquisition	Nov. 22	MT
Broadcom closes $69 bln VMware deal after China approval	Nov. 22	RE
'Boomerang' CEOs of major companies	Nov. 22	RE
Broadcom Inc. completed the acquisition of VMware, Inc. from Michael S. Dell, Dodge & Cox, Silver Lake Management, L.L.C., Silver Lake Partners V DE, L.P., SL SPV-2, L.P. and others.	Nov. 21	CI
Broadcom, VMware Receive Final Regulatory Approval for Merger; Expect to Close Deal by Wednesday	Nov. 21	MT
Broadcom, VMware Say $61 Billion Takeover Deal is Cleared by Regulators	Nov. 21	DJ
Broadcom plans to close $69 billion VMWare deal on Wednesday	Nov. 21	RE
Broadcom, VMWare Acquire All Regulatory Approvals For Merger, Intend to Close Acquisition November 22	Nov. 21	MT
China market regulator grants conditional approval of Broadcom-VMware deal	Nov. 21	RE
CHINA'S MARKET REGULATOR GAVE CONDITIONAL APPROVAL FOR BROADCOM'…	Nov. 21	RE
Orange Business and VMware, Inc. Transform Flexible Sd-Wan to Simplify Customer Experience with Digitalization and Automation	Nov. 08	CI
NetApp Launches Hybrid Cloud Offering for Small, Medium-Sized Businesses	Nov. 07	MT
VMware Expands Partnership With Google Cloud	Nov. 07	MT
VMware Unveils Advanced Automation Capabilities Aimed to Simplify Information Technology Workflows	Nov. 07	MT

VMware, Inc.

Equities

VMW

US9285634021

Software

VMware : Achieving Near Bare-metal Performance for HPC Workloads on VMware vSphere 7

Latest news about VMware, Inc.

Chart VMware, Inc.

Company Profile

Sector System Software