Replacing vCenter with vSAN Encryption Enabled

In my previous post, I talked about vSAN Encryption configuration, and key re-generation among other topics. On that post you can see that there is a trust relationship amongst the vCenter and KMS server/cluster. But what happens if my vCenter dies, gets corrupted, or I simply want to build a new vCenter and migrate my vSAN nodes to it with Encryption enabled???

One day, the lab that hosts my vCenter had a power issue and the VCSA appliance became corrupted. I was able to fix the issue, but then discovered that SSO was not working. I figured it was faster to deploy a new VCSA appliance rather than troubleshooting (yes, I’m impatient). I deleted the old vCenter and proceeded to deploy the a new VCSA.

As I was adding the host to the new vCenter, I remembered that vSAN encryption was enabled. Now what? Sure enough after all the hosts were moved, the drives from within the Disk Groups were showing unmounted. I went ahead and created a new relationship with the same KMS cluster, but the issue persisted.

If you run the command “vdq -q” from one of the host, you will see that your drives are not mounted and are ineligible for use by vSAN. In the UI you will see that your disks are encrypted and locked because the encryption key is not available.

The FIX:

In order to recover from this and similar scenarios, it is necessary to create a new cluster with the same exact configuration as before. Although I did establish a relationship with the same KMS cluster, I missed a very important step, the KMS cluster ID.

It is imperative that the same KMS cluster ID remains in order for the recovery feature to work. Let’s think about the behavior. Although the old vCenter is gone, the hosts still have the information and keys from the KMS cluster, if we connect to the same KMS cluster with the same cluster ID, the hosts will be able to retrieve the key (assuming the key still exists and was not deleted). The KMS credentials will be re-applied to all hosts so that hosts can connect to KMS to get the keys.

Remember that the old vCenter was removed, so I couldn’t retrieve the KMS cluster ID from the vCenter KMS config screen, and this environment was undocumented since it is one of my labs (it is now documented). Now what?

Glad you asked. Let’s take a look at the encryption operation.

In this diagram we can see how the keys are distributed to vCenter, hosts, etc. The KMS server settings are passed to hosts from vCenter by the KEK_id.

In order to obtain the kmip cluster ID, we need to look for it under the esx.conf file for the hosts.  You can use cat, vi, or grep (easier) to look at the conf file. You want to look for kmipClusterId, name(alias), etc. Make sure the KMS cluster on the new vCenter configured exactly as it was before.

cat /etc/vmware.esx.conf 

or something easier…

grep “/vsan/kmipServer/” /etc/vmware/esx.conf

After the KMS cluster has been added to new vCenter as it was configured in the old vCenter, there is no need for reboots. During reconfiguration the new credentials will be sent to all hosts and such hosts should reload keys for all disks in a few minutes.