What is VM Overreserved and Why is it taking so much Space?

Today I saw a question in the VMTN communities and thought is was “blog worthy”. In essence the question was “What is VM Overreserved and Why is it taking so much space?” hence the very original title for this blog…

When we talk about VM overreserved, we are talking in the context of VMs within vSAN, more specifically Objects within vSAN. There are a couple of KB articles explaining that VM overreserved is seen when dedupe & Compression is disabled and we are using Object Space Reservation (OSR = non-zero). To quote the KB “Used – VM Over-reserved: Space wasted due to higher than needed space reservation setting. Reducing object space reservation policy can free up space without the need to delete or move any data”

What if OSR is set to 0%, but I still see a lot of space being consumed???

Remember we are talking about objects. By default, vSAN thin-provisions objects on the back end; however, this does not apply to swap files, and these objects are thick provisioned by default. To take it a step further, these objects (swap) utilize the default policy (FTT=1 default setting). This means that the swap object in vSAN will be “size of memory of VM * 2 (FTT=1)”. If you change the FTT on the default policy, then the amount of space used will increase (FTT=2 & 3).

If you do not wish to have this space “wasted”, you can disable SwapThickProvision advanced setting on the hosts… BUT, Proper planning should be done prior to disabling this. I wrote a blog about this not long ago. https://greatwhitetec.com/2017/03/20/vsan-sparse-swap/

The VM overreserved setting appears when dedupe/compression is disabled. I did a quick test on my lab to demonstrate this.

  • Hosts had SwapThickProvisionDisabled set to 0
  • Used – VM overreserved = 40GB
  • Changed /VSAN/SwapThickProvisionDisabled to 1  (see my blog for how to…)
  • Turned VMs off/back on – remember, swap space is taken during VM boot and released when turned off.
  • Space of VM overreserved now = 0 GB

 

 

Replacing vCenter with vSAN Encryption Enabled

In my previous post, I talked about vSAN Encryption configuration, and key re-generation among other topics. On that post you can see that there is a trust relationship amongst the vCenter and KMS server/cluster. But what happens if my vCenter dies, gets corrupted, or I simply want to build a new vCenter and migrate my vSAN nodes to it with Encryption enabled???

One day, the lab that hosts my vCenter had a power issue and the VCSA appliance became corrupted. I was able to fix the issue, but then discovered that SSO was not working. I figured it was faster to deploy a new VCSA appliance rather than troubleshooting (yes, I’m impatient). I deleted the old vCenter and proceeded to deploy the a new VCSA.

As I was adding the host to the new vCenter, I remembered that vSAN encryption was enabled. Now what? Sure enough after all the hosts were moved, the drives from within the Disk Groups were showing unmounted. I went ahead and created a new relationship with the same KMS cluster, but the issue persisted.

If you run the command “vdq -q” from one of the host, you will see that your drives are not mounted and are ineligible for use by vSAN. In the UI you will see that your disks are encrypted and locked because the encryption key is not available.

The FIX:

In order to recover from this and similar scenarios, it is necessary to create a new cluster with the same exact configuration as before. Although I did establish a relationship with the same KMS cluster, I missed a very important step, the KMS cluster ID.

It is imperative that the same KMS cluster ID remains in order for the recovery feature to work. Let’s think about the behavior. Although the old vCenter is gone, the hosts still have the information and keys from the KMS cluster, if we connect to the same KMS cluster with the same cluster ID, the hosts will be able to retrieve the key (assuming the key still exists and was not deleted). The KMS credentials will be re-applied to all hosts so that hosts can connect to KMS to get the keys.

Remember that the old vCenter was removed, so I couldn’t retrieve the KMS cluster ID from the vCenter KMS config screen, and this environment was undocumented since it is one of my labs (it is now documented). Now what?

Glad you asked. Let’s take a look at the encryption operation.

In this diagram we can see how the keys are distributed to vCenter, hosts, etc. The KMS server settings are passed to hosts from vCenter by the KEK_id.

In order to obtain the kmip cluster ID, we need to look for it under the esx.conf file for the hosts.  You can use cat, vi, or grep (easier) to look at the conf file. You want to look for kmipClusterId, name(alias), etc. Make sure the KMS cluster on the new vCenter configured exactly as it was before.

cat /etc/vmware.esx.conf 

or something easier…

grep “/vsan/kmipServer/” /etc/vmware/esx.conf

After the KMS cluster has been added to new vCenter as it was configured in the old vCenter, there is no need for reboots. During reconfiguration the new credentials will be sent to all hosts and such hosts should reload keys for all disks in a few minutes.

 

vSAN 6.6 Encryption Configuration

New on vSAN 6.6, vSAN native encryption for data at rest is now available. This feature does not require self-encrypting drives (SEDs). Encryption is supported on both all-flash and hybrid configurations of vSAN, and it is done at the datastore level.

It is important to note that data is encrypted during the de-staging process, which means that all other vSAN features are fully supported, such as deduplication and compression, among others.

Given the multitude of KMS vendors, the setup and configuration of KMS is not part of this document, and it is a pre-requisite prior to enabling encryption on vSAN datastore.

Requirements for vSAN Encryption:

  • Deploy KMS cluster/server of your choice
  • Add/trust KMS server to vCenter UI
  • vSAN encryption requires on-disk format version 5
    • If current on-disk format is below version 5, a rolling on-disk will need to be completed prior to enabling encryption
  • When vSAN encryption is enabled all disks are reformatted
    • This is achieved in a rolling manner

 

Initial configuration is done in the VMware vCenter Server user interface of the vSphere Web Client. The KMS cluster is added to vCenter Server and a trust relationship is established. The process for doing this is vendor-specific. Consult your KMS vendor documentation prior to adding the KMS cluster to vCenter.

To add the KMS cluster to vCenter in the vSphere Web Client, click on the vCenter server, click on “Configure” tab, “Key Management Servers”, and click “add KMS”. Enter the information for your specific KMS cluster/server.

 

Once the KMS cluster/server has been added, you will need to establish trust with the KMS server. Follow the instructions from your KMS vendor as they differ from vendor to vendor.

 

After the KMS has been configured, you will see that the connections status and the certificate have green checks, meaning we are ready to move forward.

 

Now, we need to verify that all of the disks in the cluster are on version 5 for on-disk format prior to enabling vSAN encryption, since version 5 is a requirement.

 

 

At this point we are ready to turn encryption on, since we have completed the first three steps.

  • Deploy KMS cluster/server of your choice
  • Add/trust KMS server to vCenter UI
  • vSAN encryption requires on-disk format version 5
  • When vSAN encryption is enabled all disks are reformatted

 

To enable vSAN encryption, click on the vSAN cluster, “Configure” tab, and “General” under the vSAN section, and click “edit”. Here we have the option to erase the disk before use. This will increase the time it will take to do the rolling format of the devices, but it will provide better protection.

 

After you click ok, vSAN will remove one Disk Group at a time, format each device, and recreate the Disk Group once the format completed. It will then move on to the next Disk Group until all Disk Groups are recreated, and all devices formatted. During this period, data will be evacuated from the Disk Groups, so you will see components resyncing.

 

Note: This process can take quite some time depending on the amount of data that needs to be migrated during the rolling reformat, so please plan accordingly.

 

Once vSAN encryption is enabled, you are able to disable encryption; however, the same procedure is needed as far as reformatting all the drives in a rolling manner.

 

New Key Generation

You also have the capability of generating new keys for encryption. There are 2 modes for rekeying. One of them is a high level rekey where the data encryption key is wrapped by a new key encryption key. The other level is a complete re-encryption of all data. This second rekey (deep rekey) may take significant time to complete as all the data will have to be re-written, and may decrease performance.

 

 

Summary of expected behaviors:

  • Enabling vSAN Encryption requires disk reformat with object resyncs.
  • You don’t have to erase all the disks first prior to using native encryption unless you want to reduce the possibility of data leakage and have a decreased attack vector. However, this will result in additional time required to erase disks, reformat drives, and enable encryption.
  • Enabling vSAN Deduplication and Compression still requires disk reformat with object resyncs whether the Disk Group is encrypted or not.
  • Disabling any of the aforementioned features requires another reformat of the devices along with object resyncs.

What’s new on vSAN 6.6

Today, one of the largest vSAN releases was announced. This release comes packed with new features, enhancements, and a lot of improvements; making vSAN 6.6 easier to deploy with enhanced performance, and a more complete HCI platform.

What’s New with vSAN 6.6?

Native Encryption 

Encryption is one of the main features for this release. This is a software solution rather than just using self encrypted devices (SEDs), which are not needed by the way. Any HCI can add SEDs and call their solution encryption ready, but vSAN goes a step further and provides software encryption for data at rest.

 

vSAN Configuration Assist 

The vSAN configuration assist allows customer to check hardware compatibility, conduct burn-in tests, check network and vSAN configurations, as well as getting recommendation for optimal cluster configuration based on current status. For example, the configuration assist will check to make sure all vSAN vmknics are configured properly, as well as recommending upgrading on-dik format to the latest versions. Such recommendations will allow for a configuration that follows vSAN best practices.

 

Hardware Lifecycle Management

This feature allows customers to be able to update outdated controller firmware and driver version for example. In such case the outdated hardware will be identified and you will have the option to download and install the latest version directly from the vSphere Web Client. This feature removes the need for vendor-specific tools as it provides an orchestrated hardware lifecycle management across the vSAN cluster.

 

Host Client vSAN visibility

Although vSAN is not heavily dependent on vCenter, in the event that vCenter is not available we lose some visibility from the UI perspective. On vSAN 6.6, the HTML5 Host Client now has visibility and capability of doing health checks not only for the host itself but also for the entire cluster.  Alternatively you can use “esxcli vsan” commands for additional tasks. Such commands have been expanded to keep up with the new features.

 

Web Client Health and Performance Monitoring 

The vSAN health function has now significantly more checks to aid in proper configuration and troubleshooting. Monitoring and alerts have also been added for the new features such as physical disk health, networking, etc. On the performance diagnostics side,  you are now able to query throughput, latency, and IOPs among others.

 

Host Evacuation Pre-checks 

Built into maintenance mode operation and disk/disk group removal, the pre-check allows for lower operational overhead, and reduces risk by helping ensure proper capacity remains after a host evacuation. The pre-check will show if there is sufficient capacity for data movement and how much data will be moved. I really like this feature as it gives visibility to the “What-if” for each option of maintenance mode.

 

Easy Deployment 

The new VCSA is now capable to deploy a vSAN cluster on a single node and place the VCSA appliance on the vSAN datastore. This eliminates the need for external storage, forgoing to claim disks, or bootstrap scripts. This makes greenfield deployments quick and easy for vSAN clusters.

 

Multicast Dependency REMOVED 

Yes! Another big step here. The need for multicast is no longer required. In fact, once you upgrade your vCenter and hosts to version 6.6 the networking mode is automatically changed to unicast.

Proactive Drive HA 

vSAN 6.6 intelligently predicts device failure and proactively move data out of the failing device before it actually fails and cause a Permanent Device Loss action.

 

 

Other Great Additions:

  • vRealize Management Pack for vSAN
  • Easier replacement of witness host on stretched-clusters
  • vSAN API and PowerCLI enhancements
  • Local Failure Protections for Stretched Clusters through Primary and Secondary FTTs
  • Stretched Cluster Site Affinity
  • Deduplication and Compression Performance Enhancements
  • Checksum optimization
  • Rebuild & Resync Enhancements (Partial Rebuilds)
  • Proactive and more aggressive de-staging

 

I’ll be writing a few more blogs about the new features. Stay tuned.

 

vSAN Sparse Swap

Although this is not a new feature, it still seems to be a little known feature by many that can save you quite a bit of disk space. Prior to vSAN 6.2, all VMs were created reserving 100% of the configured memory, when there were no memory reservations defined. Because vSAN makes copies of the objects, we would need 2x the size of the memory per VM to be reserved for swap space. Remember that default policy is FTT=1 (2 copies), and swap being a “special” object, it will get the default policy (not subject to SPBM) and not Raid5 or Raid6 otherwise. On average, VMs have ~4GB-8GB of memory, depending on the workload of course. When we multiply this by the number of VMs, we can end up using quite a bit of reserved storage for swap objects. If we assume ~1,000 VMs per cluster, we would use ~8TB of space, in theory.

 

When is swap needed?

When  you overcommit memory, swap will be needed; and when you don’t overcommit, you may benefit by disabling SwapThickProvision in order to save some space.

This is an advanced config setting per host in the cluster. You can set this up through UI(6.5+), CLI, PowerCLI, and even Host Profiles.

 

CLI

get:  esxcfg-advcfg -g /VSAN/SwapThickProvisionDisabled

if 0 = Enabled. Swap is thick provisioned

if 1 = Disabled. Space savings

set:   esxcfg-advcfg -s 1 /VSAN/SwapThickProvisionDisabled

 

UPDATE ***another command using esxcli***

list: esxcli system settings advanced list -o /VSAN/SwapThickProvisionDisabled

set: esxcli system settings advanced set -o /VSAN/SwapThickProvisionDisabled -i 1

PowerCLI

A colleague of mine, Jase McCarty, created a PowerCLI script to set this at the Cluster level… Talk about the “easy button”/

https://github.com/jasemccarty/Vsan-Settings/blob/master/Vsan-SetSwapFiles.ps1

 

Host Profiles

The host profiles works the same way whether VSAN is enabled or not, you still create the profile from a host, apply it, check compliance, remediate. You will of course see more settings if vSAN is enabled.

I mentioned that this is an advanced configuration setting, so this setting is not visible on the host profile, unless this was set to 1 prior to exporting the host profile from one of your hosts. That means, you will have to manually set this on one host, and then export the profile for it to be visible on the WebClient.

 

UI

vSAN 6.5 –  Host>Configure>Advanced System Settings>Edit>VSAN.SwapThickProvisionDisabled> Set to 1

 

Friendly Advice: Again, if you are overcommitting memory on your hosts, DO NOT disable SwapThickProvision.