vSAN 6.7 Upgrade Considerations

On April 17, 2018, VMware released vSphere 6.7. This includes vCenter, ESXi, and of course vSAN. A lot of people are looking to upgrade in order to take advantage of the new features, and enhancements; primarily the H5 client… good bye flash client! Links to more info on what’s new for vSphere and vSAN.

From an HMTL 5 client perspective, the feature parity is about 95%. For vSAN alone, it is about 99%, as we are only missing Configuration Assist, which is still available via the flash client.

I see a lot of people getting confused, and still believe that vSAN is a separate upgrade, just like traditional storage. Fortunately, vSAN is in the kernel, so once you upgrade ESXi you have also upgraded vSAN. Boom!!! Even though the version numbers may not be exact between ESXi and vSAN, they still go hand-in-hand. With that, it is important to understand the steps necessary for a vSAN upgrade.

Based on the nature of vSAN, we need to follow the vSphere upgrade path. This includes checking the VMware Product Interoperability Matrices, not only with your current versions against the versions you are going to upgrade to, but also all the other VMware products such as NSX, SRM, vROps, etc.

 

Upgrade Process Overview

From an upgrade process perspective, you have options. You can migrate your Windows vCenter to the vCenter Appliance (recommended). If you already have the vCenter Appliance, you can either do an in-place upgrade, or create a new vCenter if you want to migrate your hosts over to a fresh new vCenter install. Here is more info on vSphere upgrades.

  1. Upgrade PSC
  2. Upgrade, Migrate, or deploy new vCenter
    1. Depends on current version
  3. Upgrade ESXi
    1. This will also upgrade vSAN (easy, right?)
  4. Upgrade vSAN on-disk Format (ODF)
  5. Upgrade VMware tools
  6. Upgrade vDS

 

As previously discussed, you will need to check the Product Interoperability Matrix to make sure all your products can run on vSphere 6.7. Don’t shoot from the hip, and start upgrading before you have done proper research.

 

I mentioned the choice of migrating hosts to a new vCenter. This is something I do quite often in my labs, and it is a simple process.

Migration Process Overview

  1. Export vDS configuration (including port groups)
  2. Copy licenses from old vCenter
  3. Configure new vCenter
  4. Create Datacenter in vCenter
  5. Create a Cluster and enable vSAN on it
    1. If you currently have other services enabled, they will have to be enabled on the new vCenter as well prior to moving the hosts.
  6. Add licenses
    1. Assign license to vCenter
    2. Assign vSAN license to cluster asset
  7. Create vDS on the new vCenter
  8. Import configuration and port groups to new vCenter
  9. On the new vCenter, add hosts
    1. No need to disconnect hosts on the old vCenter, they will disconnect after connecting to the new vCenter.
    2. Confirm ESXi license or assign a new one.
  10. Connect the hosts to the vDS (imported)
    1. Make sure you go through and verify assignment of uplinks, failover order, vmkernel ports, etc.
  11. Lastly, you will need to tell vSphere that this new vCenter is now authoritative
    1. You will get an alert about this

 

vSAN Perfomance
HTML 5 Client

Considerations when Enabling vSAN Encryption

In previous posts, I talked about vSAN Encryption architecture, and how to enable such feature. However, there are a couple of considerations aside from the requirements that should be taken into account prior to enabling vSAN Encryption.

BIOS Settings:

With most deployments, whether it is vSphere, or vSAN; I’ve noticed that BIOS settings are often overlook, even though they can help increase performance with a simple change. One of those settings is AES-NI. AES-NI was proposed by Intel some time back, and it is essentially a set of [new] instructions (NI), for the Advanced Encryption Standard (AES); hence the acronym AES-NI. What AES-NI does, is provide hardware acceleration to applications using AES for encryption, and decryption.

Most modern CPUs (Intel & AMD), support AES-NI, and some BIOS configurations from certain hardware vendors already have AES-NI enabled by default. When considering vSAN Encryption, it is imperative to make sure that AES-NI has been enabled in the BIOS, in order to take advantage of such offloading of instructions to the CPU as well as strengthening, and accelerating the execution of AES applications.

Failure to enable AES-NI while Encryption is enabled, may result in a dramatic cpu utilization increase. In recent versions of vSAN, the Health Check UI detects, and alerts when AES-NI has not been enabled. If the BIOS does not have the option to enable AES-NI, it is most likely that the feature is always enabled.

Note: This also applies to VM encryption.

 

Available Space

The other consideration is available space. My previous posts talk about data migration occurring if vSAN Encryption was enabled after data has been moved into the vSAN Datastore, due to the disk format task necessary. Although vSAN Encryption does not incur a space overhead for its operation, it is important to keep in mind that there needs to be enough available space to be able to evacuate an entire disk group during the configuration process.

 

Considerations for using LACP with vSAN

I am a firm believer on spending a good amount of time during the design phase of any deployment. Planning ahead, and knowing your limitations will make your life easier, maybe just a little bit, but every bit helps.

If you are planning on using LACP for link aggregation on vSAN, I strongly advise you to get familiar with your options, and check the Network Design guide at storagehub.vmware.com . In the Network Design Guide here you will learn about NIC teaming options, and LACP requirements such as LAG, vDS, as well as the PROs and CONs (below).

Pros and Cons of dynamic Link Aggregation/LACP (from Storagehub)

Pros

  • Improves performance and bandwidth: One vSAN node or VMkernel port can communicate with many other vSAN nodes using many different load balancing options
  • Network adapter redundancy: If a NIC fails and the link-state goes down, the remaining NICs in the team continue to pass traffic.
  • Rebalancing of traffic after failures is fast and automatic

Cons

  • Physical switch configuration: Less flexible and requires that physical switch ports be configured in a port-channel configuration.
  • Complex: Introducing full physical redundancy configuration gets very complex when multiple switches are used. Implementations can become quite vendor specific.

 

In addition to this, you should also consider vSphere limitations while using LACP. Some of the limitations include:

  • LACP settings not available for host profiles
  • No port mirroring
  • Does not work with ESXi dump collector.

For a complete list of limitations, visit VMware Docs here

Be cognizant of some “gotchas” as well. For example, restarting management agents on ESXi with vSAN/LACP using services.sh script, may cause some issues. Instead use”/etc/init.d/<module> restart” command to restart individual instances.  In essence if you use “services.sh restart” script to restart services, it will also restart the lacp daemon (/etc/init.d/lacp). See KB1003490

 

Like with any other deployment, you should consider your PROs, CONs, future plans, and environment dependencies among others.

Remember the 7Ps – Proper Prior Planning Prevents Pitiful Poor Performance

 

Get Inbox drivers for Storage Controllers with vSAN

If you are familiar with vSAN, you already know that the VCG is a “must follow” guide to success. In certain instances, deployments may use ESXi OEM/custom images based on internal policies or even personal preference. However, these type of images contain vendor specific tools, drivers, etc. One of the results from using such images is the use of async drivers for storage controllers rather than inbox drivers. For sake of demonstration, we can focus on the PERC H730.

While checking config assist, you can see a warning stating that the recommended driver per the VCG is not currently installed. In the picture below you can see that we have uncertified drivers, and we need to roll the drivers back to the correct version.

Links to download such drivers are typically found within VCG or the ESXi “drivers” tab from the downloads page. In some occasions, this link may not be present for the version that you are running. So how do I get the correct drivers? You certainly don’t want to be running drivers that have not been certified for vSAN, do you?!?! Of course NOT.

You can get such drivers from the ESXi offline bundle of the version you currently have. For example, let’s say you are running ESXi 6.5 U1. You will need to go to https://my.vmware.com, log in using your credentials, go to downloads, and select ESXi 6.5 U1 as the product. Download the Offline Bundle (zip). Once completed, unzip the file and navigate the vib folder until you find the correct driver. In this case we are looking for lsi_mr3 version 6.910.18.00-1vmw.650.0.0.4564106. You can then take that vib and use your preferred vib install method to update your drivers.

 

What is VM Overreserved and Why is it taking so much Space?

Today I saw a question in the VMTN communities and thought is was “blog worthy”. In essence the question was “What is VM Overreserved and Why is it taking so much space?” hence the very original title for this blog…

When we talk about VM overreserved, we are talking in the context of VMs within vSAN, more specifically Objects within vSAN. There are a couple of KB articles explaining that VM overreserved is seen when dedupe & Compression is disabled and we are using Object Space Reservation (OSR = non-zero). To quote the KB “Used – VM Over-reserved: Space wasted due to higher than needed space reservation setting. Reducing object space reservation policy can free up space without the need to delete or move any data”

What if OSR is set to 0%, but I still see a lot of space being consumed???

Remember we are talking about objects. By default, vSAN thin-provisions objects on the back end; however, this does not apply to swap files, and these objects are thick provisioned by default. To take it a step further, these objects (swap) utilize the default policy (FTT=1 default setting). This means that the swap object in vSAN will be “size of memory of VM * 2 (FTT=1)”. If you change the FTT on the default policy, then the amount of space used will increase (FTT=2 & 3).

If you do not wish to have this space “wasted”, you can disable SwapThickProvision advanced setting on the hosts… BUT, Proper planning should be done prior to disabling this. I wrote a blog about this not long ago. https://greatwhitetec.com/2017/03/20/vsan-sparse-swap/

The VM overreserved setting appears when dedupe/compression is disabled. I did a quick test on my lab to demonstrate this.

  • Hosts had SwapThickProvisionDisabled set to 0
  • Used – VM overreserved = 40GB
  • Changed /VSAN/SwapThickProvisionDisabled to 1  (see my blog for how to…)
  • Turned VMs off/back on – remember, swap space is taken during VM boot and released when turned off.
  • Space of VM overreserved now = 0 GB