What’s new on vSAN Encryption 6.7 U1?

I’ve written a few blog posts in the past about vSAN Data at Rest Encryption (D@RE). These posts explain how encryption works, and how the keys are handed over to vSphere. Go here for more info.

For vSAN D@RE to work properly, ESXi hosts need to be able to reach the KMS cluster during reboot operations. Yes, hopefully you have a cluster for redundancy, but a single KMS server will still work. This is necessary in order for ESXi hosts within the vSAN cluster to be able to obtain both the Host Encryption Key (let’s call this HEK), and the Key Encryption Key (KEK).

Wait!!! Why do we have to go to KMS again if we already received the keys?!?!

See, The Host Encryption Key, and the Key Encryption Key live in a non persistent state in memory, in the key cache. When a vSAN node (ESXi server) is rebooted, these key go away (poof…gone). So, when vSAN encryption is enabled, and the hosts are rebooted, it needs to go out to the KMS and get those keys. So you may want to make sure that your hosts can talk to KMS, and that KMS has your keys before you consider rebooting your hosts. Oh yeah, it goes without saying that the KMS should NOT be in the vSAN cluster, and you can see why.

Once the HEK is obtained, the host reaches a crypto-safe mode, which allows the host to obtain a good operational state, and continue with the boot process, at which point it asks for the KEK from KMS. If the host is not able to obtain such keys from the KMS cluster, the host will continue to boot; however, the disks will not be mounted as the host was not in crypto-safe mode, and it was not able to obtain the KEK from KMS resulting in failure to unwrap the Data Encryption Key (DEK).

In a scenario where hosts are being updated/upgraded via VUM, in most occasions the hosts will do a rolling reboot as part of the VUM process. With vSAN versions 6.7 and prior, rolling reboots of hosts via VUM were allowed, irrelevant of the state of the connection with KMS, and the availability of keys. As already described, these keys are necessary in order to properly mount the drives on each host during a reboot.

In vSAN 6.7 Update 1, VMware has added guard rails to prevent disks of multiple hosts from unmounting due to lack of connectivity with KMS, or accidental key deletion. During an upgrade operation, VUM will place a host in Enhanced Maintenance Mode (EMM), perform updates, reboot, and exit EMM. If after a reboot, the host is not able to reach crypto-safe mode, the host will not exit EMM – stalling the VUM progress. In this case, the host’s drives are not mounted due to it not being able to reach the crypto-safe mode, if we allow the upgrade to continue, all other hosts will upgrade, but all the drives within the vSAN datastore will be unmounted.

This new guard rail, helps prevent losing all vSAN storage due to connectivity issues, or accidental changes with KMS, and key availability. This feature also highlights the benefits of having a HCI solution embedded in the kernel, the ease of orchestration with other vSphere components, and features makes vSAN even more appealing.

vSAN 6.7 Upgrade Considerations

On April 17, 2018, VMware released vSphere 6.7. This includes vCenter, ESXi, and of course vSAN. A lot of people are looking to upgrade in order to take advantage of the new features, and enhancements; primarily the H5 client… good bye flash client! Links to more info on what’s new for vSphere and vSAN.

From an HMTL 5 client perspective, the feature parity is about 95%. For vSAN alone, it is about 99%, as we are only missing Configuration Assist, which is still available via the flash client.

I see a lot of people getting confused, and still believe that vSAN is a separate upgrade, just like traditional storage. Fortunately, vSAN is in the kernel, so once you upgrade ESXi you have also upgraded vSAN. Boom!!! Even though the version numbers may not be exact between ESXi and vSAN, they still go hand-in-hand. With that, it is important to understand the steps necessary for a vSAN upgrade.

Based on the nature of vSAN, we need to follow the vSphere upgrade path. This includes checking the VMware Product Interoperability Matrices, not only with your current versions against the versions you are going to upgrade to, but also all the other VMware products such as NSX, SRM, vROps, etc.

 

Upgrade Process Overview

From an upgrade process perspective, you have options. You can migrate your Windows vCenter to the vCenter Appliance (recommended). If you already have the vCenter Appliance, you can either do an in-place upgrade, or create a new vCenter if you want to migrate your hosts over to a fresh new vCenter install. Here is more info on vSphere upgrades.

  1. Upgrade PSC
  2. Upgrade, Migrate, or deploy new vCenter
    1. Depends on current version
  3. Upgrade ESXi
    1. This will also upgrade vSAN (easy, right?)
  4. Upgrade vSAN on-disk Format (ODF)
  5. Upgrade VMware tools
  6. Upgrade vDS

 

As previously discussed, you will need to check the Product Interoperability Matrix to make sure all your products can run on vSphere 6.7. Don’t shoot from the hip, and start upgrading before you have done proper research.

 

I mentioned the choice of migrating hosts to a new vCenter. This is something I do quite often in my labs, and it is a simple process.

Migration Process Overview

  1. Export vDS configuration (including port groups)
  2. Copy licenses from old vCenter
  3. Configure new vCenter
  4. Create Datacenter in vCenter
  5. Create a Cluster and enable vSAN on it
    1. If you currently have other services enabled, they will have to be enabled on the new vCenter as well prior to moving the hosts.
  6. Add licenses
    1. Assign license to vCenter
    2. Assign vSAN license to cluster asset
  7. Create vDS on the new vCenter
  8. Import configuration and port groups to new vCenter
  9. On the new vCenter, add hosts
    1. No need to disconnect hosts on the old vCenter, they will disconnect after connecting to the new vCenter.
    2. Confirm ESXi license or assign a new one.
  10. Connect the hosts to the vDS (imported)
    1. Make sure you go through and verify assignment of uplinks, failover order, vmkernel ports, etc.
  11. Lastly, you will need to tell vSphere that this new vCenter is now authoritative
    1. You will get an alert about this

 

vSAN Perfomance
HTML 5 Client

Considerations when Enabling vSAN Encryption

In previous posts, I talked about vSAN Encryption architecture, and how to enable such feature. However, there are a couple of considerations aside from the requirements that should be taken into account prior to enabling vSAN Encryption.

BIOS Settings:

With most deployments, whether it is vSphere, or vSAN; I’ve noticed that BIOS settings are often overlook, even though they can help increase performance with a simple change. One of those settings is AES-NI. AES-NI was proposed by Intel some time back, and it is essentially a set of [new] instructions (NI), for the Advanced Encryption Standard (AES); hence the acronym AES-NI. What AES-NI does, is provide hardware acceleration to applications using AES for encryption, and decryption.

Most modern CPUs (Intel & AMD), support AES-NI, and some BIOS configurations from certain hardware vendors already have AES-NI enabled by default. When considering vSAN Encryption, it is imperative to make sure that AES-NI has been enabled in the BIOS, in order to take advantage of such offloading of instructions to the CPU as well as strengthening, and accelerating the execution of AES applications.

Failure to enable AES-NI while Encryption is enabled, may result in a dramatic cpu utilization increase. In recent versions of vSAN, the Health Check UI detects, and alerts when AES-NI has not been enabled. If the BIOS does not have the option to enable AES-NI, it is most likely that the feature is always enabled.

Note: This also applies to VM encryption.

 

Available Space

The other consideration is available space. My previous posts talk about data migration occurring if vSAN Encryption was enabled after data has been moved into the vSAN Datastore, due to the disk format task necessary. Although vSAN Encryption does not incur a space overhead for its operation, it is important to keep in mind that there needs to be enough available space to be able to evacuate an entire disk group during the configuration process.

 

Considerations for using LACP with vSAN

I am a firm believer on spending a good amount of time during the design phase of any deployment. Planning ahead, and knowing your limitations will make your life easier, maybe just a little bit, but every bit helps.

If you are planning on using LACP for link aggregation on vSAN, I strongly advise you to get familiar with your options, and check the Network Design guide at storagehub.vmware.com . In the Network Design Guide here you will learn about NIC teaming options, and LACP requirements such as LAG, vDS, as well as the PROs and CONs (below).

Pros and Cons of dynamic Link Aggregation/LACP (from Storagehub)

Pros

  • Improves performance and bandwidth: One vSAN node or VMkernel port can communicate with many other vSAN nodes using many different load balancing options
  • Network adapter redundancy: If a NIC fails and the link-state goes down, the remaining NICs in the team continue to pass traffic.
  • Rebalancing of traffic after failures is fast and automatic

Cons

  • Physical switch configuration: Less flexible and requires that physical switch ports be configured in a port-channel configuration.
  • Complex: Introducing full physical redundancy configuration gets very complex when multiple switches are used. Implementations can become quite vendor specific.

 

In addition to this, you should also consider vSphere limitations while using LACP. Some of the limitations include:

  • LACP settings not available for host profiles
  • No port mirroring
  • Does not work with ESXi dump collector.

For a complete list of limitations, visit VMware Docs here

Be cognizant of some “gotchas” as well. For example, restarting management agents on ESXi with vSAN/LACP using services.sh script, may cause some issues. Instead use”/etc/init.d/<module> restart” command to restart individual instances.  In essence if you use “services.sh restart” script to restart services, it will also restart the lacp daemon (/etc/init.d/lacp). See KB1003490

 

Like with any other deployment, you should consider your PROs, CONs, future plans, and environment dependencies among others.

Remember the 7Ps – Proper Prior Planning Prevents Pitiful Poor Performance

 

Get Inbox drivers for Storage Controllers with vSAN

If you are familiar with vSAN, you already know that the VCG is a “must follow” guide to success. In certain instances, deployments may use ESXi OEM/custom images based on internal policies or even personal preference. However, these type of images contain vendor specific tools, drivers, etc. One of the results from using such images is the use of async drivers for storage controllers rather than inbox drivers. For sake of demonstration, we can focus on the PERC H730.

While checking config assist, you can see a warning stating that the recommended driver per the VCG is not currently installed. In the picture below you can see that we have uncertified drivers, and we need to roll the drivers back to the correct version.

Links to download such drivers are typically found within VCG or the ESXi “drivers” tab from the downloads page. In some occasions, this link may not be present for the version that you are running. So how do I get the correct drivers? You certainly don’t want to be running drivers that have not been certified for vSAN, do you?!?! Of course NOT.

You can get such drivers from the ESXi offline bundle of the version you currently have. For example, let’s say you are running ESXi 6.5 U1. You will need to go to https://my.vmware.com, log in using your credentials, go to downloads, and select ESXi 6.5 U1 as the product. Download the Offline Bundle (zip). Once completed, unzip the file and navigate the vib folder until you find the correct driver. In this case we are looking for lsi_mr3 version 6.910.18.00-1vmw.650.0.0.4564106. You can then take that vib and use your preferred vib install method to update your drivers.