vSAN Encryption KMS info retrieval

A few years ago I wrote a blog post about “Replacing vCenter with vSAN Encryption Enabled“. For this particular exercise, one key piece of information needed to be retrieved was the kmipClusterId.

A couple of things have changed since then, in newer version of vSAN.

Change #1: ESXCLI commands

An easier way to retrieve this information with esxcli command was added. This command allows you to obtain a lot of information about the state of vSAN encryption, retrieve the hostKeyId, kekID, etc.

esxcli vsan encryption <option> get/list

 

So, based on this addition, you can now get the kmipClusterId needed for vCenter replacement by using esxcli vsan encryption kms list

As you can see, you can still look for this information on the esx.conf file which is where the hosts store this information for this particular version of vSAN (6.7 P01 – Build 15160138). Which brings me to the second update…

 

Change #2: vSAN Persistence

In vSAN 7.0 and beyond some changes were made on how this configuration gets stored. In this case, the encryption information that was previously file based (esx.conf) is now stored in a database. This provides better concurrency for multiple readers and writers versus the file based esx.conf option, among other advantages.

The good news is that the esxcli vsan encryption command will still allow you to retrieve the information needed in regards to encryption. However, if you attempt to retrieve this information from the esx.conf file, you won’t be able to find it there anymore.

Alternatively, you can retrieve the information directly from the config-store… maybe more info than you need. So, I’ld just stick to esxcli commands.

Replacing vCenter with vSAN Encryption Enabled

In my previous post, I talked about vSAN Encryption configuration, and key re-generation among other topics. On that post you can see that there is a trust relationship amongst the vCenter and KMS server/cluster. But what happens if my vCenter dies, gets corrupted, or I simply want to build a new vCenter and migrate my vSAN nodes to it with Encryption enabled???

One day, the lab that hosts my vCenter had a power issue and the VCSA appliance became corrupted. I was able to fix the issue, but then discovered that SSO was not working. I figured it was faster to deploy a new VCSA appliance rather than troubleshooting (yes, I’m impatient). I deleted the old vCenter and proceeded to deploy a new VCSA.

As I was adding the host to the new vCenter, I remembered that vSAN encryption was enabled. Now what? Sure enough after all the hosts were moved, the drives from within the Disk Groups were showing unmounted. I went ahead and created a new relationship with the same KMS cluster, but the issue persisted.

If you run the command “vdq -q” from one of the host, you will see that your drives are not mounted and are ineligible for use by vSAN. In the UI you will see that your disks are encrypted and locked because the encryption key is not available.

The FIX:

In order to recover from this and similar scenarios, it is necessary to create a new cluster with the same exact configuration as before. Although I did establish a relationship with the same KMS cluster, I missed a very important step, the KMS cluster ID.

It is imperative that the same KMS cluster ID remains in order for the recovery feature to work. Let’s think about the behavior. Although the old vCenter is gone, the hosts still have the information and keys from the KMS cluster, if we connect to the same KMS cluster with the same cluster ID, the hosts will be able to retrieve the key (assuming the key still exists and was not deleted). The KMS credentials will be re-applied to all hosts so that hosts can connect to KMS to get the keys.

Remember that the old vCenter was removed, so I couldn’t retrieve the KMS cluster ID from the vCenter KMS config screen, and this environment was undocumented since it is one of my labs (it is now documented). Now what?

Glad you asked. Let’s take a look at the encryption operation.

In this diagram we can see how the keys are distributed to vCenter, hosts, etc. The KMS server settings are passed to hosts from vCenter by the KEK_id.

In order to obtain the kmip cluster ID, we need to look for it under the esx.conf file for the hosts.  You can use cat, vi, or grep (easier) to look at the conf file. You want to look for kmipClusterId, name(alias), etc. Make sure the KMS cluster on the new vCenter configured exactly as it was before.

cat /etc/vmware/esx.conf 

or something easier…

grep “/vsan/kmipServer/” /etc/vmware/esx.conf

After the KMS cluster has been added to new vCenter as it was configured in the old vCenter, there is no need for reboots. During reconfiguration the new credentials will be sent to all hosts and such hosts should reload keys for all disks in a few minutes.