As of October 20, 2021, SAP HANA 1.0 SPS 12 Revision 122.19 and later is supported for production use on Intel Cascade Lake based server systems with vSphere 7.0 U2. Please review SAP note "3102813 - SAP HANA on VMware vSphere 7.0 U2 with up to 12 TB 448 vCPUs VM sizes" for details.
This is a very important milestone for VMware and all its SAP HANA customers and partners as it allows to deploy new or seamless extend existing SAP HANA VMs to support data growth up to 12 TB by maintaining the operation theme and SLAs customers enjoy by levering vSphere virtualized SAP HANA systems.
CPU architecture: Intel Cascade Lake processors of TDIv5 Model 2 & 4 & 8 sockets
The minimum size of a virtual SAP HANA instance is a half socket (represented by at least 8 physical cores) and 128 GB of RAM.
The maximum size of a virtual SAP HANA instance, running on Intel Cascade Lake based systems, is 448 vCPUs with vSphere 7 U2. The maximal memory is limited to 12 TB. For the maximal supported 8-socket VM size, OLAP workload is supported up to 6 TB of memory (Class L) or 12 TB (Class M - customers must size accordingly).
The memory limitations can be mitigated via workload-based sizing. Details see SAP note 2779240.
VM size - full socket:
Minimum 1 socket / maximum 8 sockets
1-,2-,3-, 4-, and 8-socket VMs on up to 8-socket server (fully & partly QPI meshed)
VM size - "half socket" (NUMA Node sharing VM):
NUMA Node sharing VMs on 2-, 4-, and 8-socket server
No odd multiples of 0.5 (half) sockets like 1.5 socket VMs, 2.5 socket VMs etc.
VMXNET3 SAP workload specific network latency considerations
As mentioned in SAP note 3102813 and documented in VMware KB 83957, virtualized network based on VMXNET3 NICs, adds typically 60 µs (no load) and up to 220 µs latency (high CPU load((~80%)) to every network package sent, when compared to a bare metal installed SAP HANA system. This can impact SAP OLTP and OLAP type workloads issued by remote application servers / users.
This latency increase has an impact on the overall SAP HANA database request time and may get noticed by users under certain conditions, like during high IP traffic bursts. In typical low OLTP transaction frequency and OLAP environments the virtual network caused latencies do not impact HANA DB runtime or tph / qph results.
To test the impact of virtualization, especially on the DB request time, we used a S/4HANA mixed workload test with the 8-socket wide 12 TB HANA VM with VMXNET3 NICs and compared it to a bare-metal installed SAP HANA system running on the same 8-socket Cascade Lake 8280L CPU system. 35,000 to 78,000 concurrent users were used for this measurement. The 78,000 users generated a massive amount of OLTP transactions, and a CPU load up to ~80%. After 80% CPU load the TPH performance is degrading, and this point is defined as the so-called max. out measurement point.
Looking now on the observed VMXNET3 latency overhead of 8-socket wide VM with 416vCPU (32 CPU threads where reserved to handle this massive network load) to a natively installed SAP HANA system running on the same system, we see how these microseconds accumulate to a DB request time deviation between 24 ms and ~100 ms. See figure 1 for this comparison.
Figure 1 - Mixed workload OLTP DB request time in ms (lower is better)
While the DB request time gets impacted by up to 22% (24 ms higher) @ 35% CPU utilization, the OLTP TPH and OLAP QPH results were not impacted, see figure 2. At ~65% CPU utilization the DB request time increased to 40-45% (76-93 ms higher) with a TPH deviation of ~-3% and at the maximal user capacity with 78k users @ ~80% CPU utilization the TPH / QPH impact was ~-12%.
Using now a passthrough network device instead of a VMXNET3 network card reduces the DB request time and keeps the THP/QPH deviation below -4% at the max. out point. Also reserving CPU threads to handle network traffic on the ESXi side is not necessary, since the network traffic gets handled inside the VM. See figures 1 and 2.
Figure 2 - Mixed workload OLTP DB TPH (higher is better)
The test results show that the main impact of this specific S/4 HANA mixed workload test can get measured at the higher user load numbers that generate in a very short time massive OLTP requests and that impact to OLAP DB request time is very little, see figure 3.
Figure 3 - Mixed workload OLAP DB request time in ms (lower is better)
Network Best Practices
As mentioned, virtual networking adds latency. This latency cannot get removed, but it can get lowered. Below some best practices optimize the virtual network card VMXNET3 latency:
Verify and if required correct your physical network configuration / topology. The ESXi host running SAP HANA VMs should get connected directly on the SAP app server network switch, no firewall or bandwidth / latency limiting network link should be used.
Size the ESXi host accordingly the number of VMs running on the host. Most recommendations SAP has published are for one single SAP HANA host. If you consolidate several SAP HANA VMs on a single host then you may want to add several physical network cards to the ESXi host.
It is recommended to reduce the number of vCPUs of the VM to free up CPU cycles for ESXi to manage better intensive OLTP type network traffic. E.g. configure an 8-socked wide VM with 416 vCPUs instead of 448 vCPUs or 212 vCPUs instead of 224 vCPUs for 4-socket wide VM running on 4-socket systems.
Apply following settings to lower the virtual VMXNET3 network latencies:
1. Disable the RxQueueFeatPair setting on the ESXi host. For this to take effect, the ESXi host needs to be rebooted.
Procedure: Go to the ESXi console and set following parameter:
esxcli system settings advanced set -o /Net/NetNetqRxQueueFeatPairEnable -i 0
2. Change the rx-usec, lro and rx / tx values of the VMXNET3 OS driver from the default value of 250 to 25.
Procedure: Logon to the inside the VM running OS and use ethtool to change below settings.
Note: In the case you have more network cards configured, then select the NIC used for the SAP app server to DB traffic, e.g. eth1.
ethtool -C ethX rx-usec 25
ethtool -K ethX lro off
ethtool -G ethX rx 512 rx-mini 0 tx 512
NOTE: Exchange X with the actual number, like eth0
If these optimizations do not lower the DB request time in the needed range, then use a PT NIC for the SAP app server network. Refer to the VMware product documentation how to do this. See links below.
At a CPU utilization (between 30 and 50%) and moderate OLTP transaction frequency the impact also on OLTP is moderate and therefore most customers shouldn't notice the higher DB request time by using a virtualized VMXNET3 network adapter.
Customers who have DB request time sensitive applications may want to consider PT NICs instead of VMXNET3 NICs at the cost of vMotion, which is not possible with PT adapters.
Validation of 8-socket wide SAP HANA VMs for Scale-Out and TDI deployments up to 18 TB with DRAM and PMem is planned.