Considerations for using LACP with vSAN

I am a firm believer on spending a good amount of time during the design phase of any deployment. Planning ahead, and knowing your limitations will make your life easier, maybe just a little bit, but every bit helps.

If you are planning on using LACP for link aggregation on vSAN, I strongly advise you to get familiar with your options, and check the Network Design guide at storagehub.vmware.com . In the Network Design Guide here you will learn about NIC teaming options, and LACP requirements such as LAG, vDS, as well as the PROs and CONs (below).

Pros and Cons of dynamic Link Aggregation/LACP (from Storagehub)

Pros

  • Improves performance and bandwidth: One vSAN node or VMkernel port can communicate with many other vSAN nodes using many different load balancing options
  • Network adapter redundancy: If a NIC fails and the link-state goes down, the remaining NICs in the team continue to pass traffic.
  • Rebalancing of traffic after failures is fast and automatic

Cons

  • Physical switch configuration: Less flexible and requires that physical switch ports be configured in a port-channel configuration.
  • Complex: Introducing full physical redundancy configuration gets very complex when multiple switches are used. Implementations can become quite vendor specific.

 

In addition to this, you should also consider vSphere limitations while using LACP. Some of the limitations include:

  • LACP settings not available for host profiles
  • No port mirroring
  • Does not work with ESXi dump collector.

For a complete list of limitations, visit VMware Docs here

Be cognizant of some “gotchas” as well. For example, restarting management agents on ESXi with vSAN/LACP using services.sh script, may cause some issues. Instead use”/etc/init.d/<module> restart” command to restart individual instances.  In essence if you use “services.sh restart” script to restart services, it will also restart the lacp daemon (/etc/init.d/lacp). See KB1003490

 

Like with any other deployment, you should consider your PROs, CONs, future plans, and environment dependencies among others.

Remember the 7Ps – Proper Prior Planning Prevents Pitiful Poor Performance

 

Get Inbox drivers for Storage Controllers with vSAN

If you are familiar with vSAN, you already know that the VCG is a “must follow” guide to success. In certain instances, deployments may use ESXi OEM/custom images based on internal policies or even personal preference. However, these type of images contain vendor specific tools, drivers, etc. One of the results from using such images is the use of async drivers for storage controllers rather than inbox drivers. For sake of demonstration, we can focus on the PERC H730.

While checking config assist, you can see a warning stating that the recommended driver per the VCG is not currently installed. In the picture below you can see that we have uncertified drivers, and we need to roll the drivers back to the correct version.

Links to download such drivers are typically found within VCG or the ESXi “drivers” tab from the downloads page. In some occasions, this link may not be present for the version that you are running. So how do I get the correct drivers? You certainly don’t want to be running drivers that have not been certified for vSAN, do you?!?! Of course NOT.

You can get such drivers from the ESXi offline bundle of the version you currently have. For example, let’s say you are running ESXi 6.5 U1. You will need to go to https://my.vmware.com, log in using your credentials, go to downloads, and select ESXi 6.5 U1 as the product. Download the Offline Bundle (zip). Once completed, unzip the file and navigate the vib folder until you find the correct driver. In this case we are looking for lsi_mr3 version 6.910.18.00-1vmw.650.0.0.4564106. You can then take that vib and use your preferred vib install method to update your drivers.

 

What is VM Overreserved and Why is it taking so much Space?

Today I saw a question in the VMTN communities and thought is was “blog worthy”. In essence the question was “What is VM Overreserved and Why is it taking so much space?” hence the very original title for this blog…

When we talk about VM overreserved, we are talking in the context of VMs within vSAN, more specifically Objects within vSAN. There are a couple of KB articles explaining that VM overreserved is seen when dedupe & Compression is disabled and we are using Object Space Reservation (OSR = non-zero). To quote the KB “Used – VM Over-reserved: Space wasted due to higher than needed space reservation setting. Reducing object space reservation policy can free up space without the need to delete or move any data”

What if OSR is set to 0%, but I still see a lot of space being consumed???

Remember we are talking about objects. By default, vSAN thin-provisions objects on the back end; however, this does not apply to swap files, and these objects are thick provisioned by default. To take it a step further, these objects (swap) utilize the default policy (FTT=1 default setting). This means that the swap object in vSAN will be “size of memory of VM * 2 (FTT=1)”. If you change the FTT on the default policy, then the amount of space used will increase (FTT=2 & 3).

If you do not wish to have this space “wasted”, you can disable SwapThickProvision advanced setting on the hosts… BUT, Proper planning should be done prior to disabling this. I wrote a blog about this not long ago. https://greatwhitetec.com/2017/03/20/vsan-sparse-swap/

The VM overreserved setting appears when dedupe/compression is disabled. I did a quick test on my lab to demonstrate this.

  • Hosts had SwapThickProvisionDisabled set to 0
  • Used – VM overreserved = 40GB
  • Changed /VSAN/SwapThickProvisionDisabled to 1  (see my blog for how to…)
  • Turned VMs off/back on – remember, swap space is taken during VM boot and released when turned off.
  • Space of VM overreserved now = 0 GB

 

 

Replacing vCenter with vSAN Encryption Enabled

In my previous post, I talked about vSAN Encryption configuration, and key re-generation among other topics. On that post you can see that there is a trust relationship amongst the vCenter and KMS server/cluster. But what happens if my vCenter dies, gets corrupted, or I simply want to build a new vCenter and migrate my vSAN nodes to it with Encryption enabled???

One day, the lab that hosts my vCenter had a power issue and the VCSA appliance became corrupted. I was able to fix the issue, but then discovered that SSO was not working. I figured it was faster to deploy a new VCSA appliance rather than troubleshooting (yes, I’m impatient). I deleted the old vCenter and proceeded to deploy a new VCSA.

As I was adding the host to the new vCenter, I remembered that vSAN encryption was enabled. Now what? Sure enough after all the hosts were moved, the drives from within the Disk Groups were showing unmounted. I went ahead and created a new relationship with the same KMS cluster, but the issue persisted.

If you run the command “vdq -q” from one of the host, you will see that your drives are not mounted and are ineligible for use by vSAN. In the UI you will see that your disks are encrypted and locked because the encryption key is not available.

The FIX:

In order to recover from this and similar scenarios, it is necessary to create a new cluster with the same exact configuration as before. Although I did establish a relationship with the same KMS cluster, I missed a very important step, the KMS cluster ID.

It is imperative that the same KMS cluster ID remains in order for the recovery feature to work. Let’s think about the behavior. Although the old vCenter is gone, the hosts still have the information and keys from the KMS cluster, if we connect to the same KMS cluster with the same cluster ID, the hosts will be able to retrieve the key (assuming the key still exists and was not deleted). The KMS credentials will be re-applied to all hosts so that hosts can connect to KMS to get the keys.

Remember that the old vCenter was removed, so I couldn’t retrieve the KMS cluster ID from the vCenter KMS config screen, and this environment was undocumented since it is one of my labs (it is now documented). Now what?

Glad you asked. Let’s take a look at the encryption operation.

In this diagram we can see how the keys are distributed to vCenter, hosts, etc. The KMS server settings are passed to hosts from vCenter by the KEK_id.

In order to obtain the kmip cluster ID, we need to look for it under the esx.conf file for the hosts.  You can use cat, vi, or grep (easier) to look at the conf file. You want to look for kmipClusterId, name(alias), etc. Make sure the KMS cluster on the new vCenter configured exactly as it was before.

cat /etc/vmware/esx.conf 

or something easier…

grep “/vsan/kmipServer/” /etc/vmware/esx.conf

After the KMS cluster has been added to new vCenter as it was configured in the old vCenter, there is no need for reboots. During reconfiguration the new credentials will be sent to all hosts and such hosts should reload keys for all disks in a few minutes.

 

vSAN 6.6 Encryption Configuration

New on vSAN 6.6, vSAN native encryption for data at rest is now available. This feature does not require self-encrypting drives (SEDs). Encryption is supported on both all-flash and hybrid configurations of vSAN, and it is done at the datastore level.

It is important to note that data is encrypted during the de-staging process, which means that all other vSAN features are fully supported, such as deduplication and compression, among others.

Given the multitude of KMS vendors, the setup and configuration of KMS is not part of this document, and it is a pre-requisite prior to enabling encryption on vSAN datastore.

Requirements for vSAN Encryption:

  • Deploy KMS cluster/server of your choice
  • Add/trust KMS server to vCenter UI
  • vSAN encryption requires on-disk format (ODF) version 5
    • You can upgrade this via Web Client
    • or if you enable Encryption or Deduplication and Compression on an existing vSAN cluster, the ODF gets upgraded to the latest version automatically.
  • When vSAN encryption is enabled all disks are reformatted
    • This is achieved in a rolling manner

 

Initial configuration is done in the VMware vCenter Server user interface of the vSphere Web Client. The KMS cluster is added to vCenter Server and a trust relationship is established. The process for doing this is vendor-specific. Consult your KMS vendor documentation prior to adding the KMS cluster to vCenter.

To add the KMS cluster to vCenter in the vSphere Web Client, click on the vCenter server, click on “Configure” tab, “Key Management Servers”, and click “add KMS”. Enter the information for your specific KMS cluster/server.

 

Once the KMS cluster/server has been added, you will need to establish trust with the KMS server. Follow the instructions from your KMS vendor as they differ from vendor to vendor.

 

After the KMS has been configured, you will see that the connections status and the certificate have green checks, meaning we are ready to move forward.

 

Now, we need to verify that all of the disks in the cluster are on version 5 for on-disk format prior to enabling vSAN encryption, since version 5 is a requirement.

 

 

At this point we are ready to turn encryption on, since we have completed the first three steps.

  • Deploy KMS cluster/server of your choice
  • Add/trust KMS server to vCenter UI
  • vSAN encryption requires on-disk format version 5
  • When vSAN encryption is enabled all disks are reformatted

 

To enable vSAN encryption, click on the vSAN cluster, “Configure” tab, and “General” under the vSAN section, and click “edit”. Here we have the option to erase the disk before use. This will increase the time it will take to do the rolling format of the devices, but it will provide better protection.

 

After you click ok, vSAN will remove one Disk Group at a time, format each device, and recreate the Disk Group once the format completed. It will then move on to the next Disk Group until all Disk Groups are recreated, and all devices formatted. During this period, data will be evacuated from the Disk Groups, so you will see components resyncing.

 

Note: This process can take quite some time depending on the amount of data that needs to be migrated during the rolling reformat, so please plan accordingly.

 

Once vSAN encryption is enabled, you are able to disable encryption; however, the same procedure is needed as far as reformatting all the drives in a rolling manner.

 

New Key Generation

You also have the capability of generating new keys for encryption. There are 2 modes for rekeying. One of them is a high level rekey where the data encryption key is wrapped by a new key encryption key. The other level is a complete re-encryption of all data. This second rekey (deep rekey) may take significant time to complete as all the data will have to be re-written, and may decrease performance.

 

 

Summary of expected behaviors:

  • Enabling vSAN Encryption requires disk reformat with object resyncs.
  • You don’t have to erase all the disks first prior to using native encryption unless you want to reduce the possibility of data leakage and have a decreased attack vector. However, this will result in additional time required to erase disks, reformat drives, and enable encryption.
  • Enabling vSAN Deduplication and Compression still requires disk reformat with object resyncs whether the Disk Group is encrypted or not.
  • Disabling any of the aforementioned features requires another reformat of the devices along with object resyncs.