VSAN 6.2 Performance Degradation (Hybrid)

In vSAN (not misspelled) 6.2, dedup and compression was introduced. These features; however, only apply to all-flash configurations and must not be set up on Hybrid environments.

Some customers have experienced performance degradation on 6.2 Hybrid environments when compared to 6.0 or 6.1 performance. Read caching performance degradation can be observed for Hybrid Disk Groups on the SSD cache tier, due to a low level scanning for unique blocks (dedup). Although this is normal for All-Flash environments, it is important to check your hosts participating on a Hybrid Cluster, to make sure this is turned OFF.

To check/change this option, you can use the ESXi Shell or PowerCli.

The setting would show “2” if it is turned ON, and “0” if it is turned OFF. It should be set to “0” for EACH Hybrid host.

Check Setting

ESXi Shell – esxcfg-advcfg -g /LSOM/lsomComponentDedupScanType 

lsom_shell_check

 

 

PowerCli – Get-VMHost<HostName> | Get-AdvancedSetting –Name LSOM.lsomComponentDedupScanType

lsom_pcli_check

 

 

 

Change Setting

ESXi Shell – esxcfg-advcfg -s 0 /LSOM/lsomComponentDedupScanType 

lsom_shell_change

 

 

PowerCli – Get-VMHost <HostName> | Get-AdvancedSetting -Name LSOM.lsomComponentDedupScanType | Set-AdvancedSetting -Value “0”

lsom_pcli_change

 

Using PowerCli is my preference, since you won’t have to enable SSH on the hosts, and you can use wildcards to check/change all the hosts with little effort.

VSAN Proactive Rebalance

balance1There has been a lot of questions as to what happens when a rebalance task is triggered in VSAN. By default, VSAN will try to do a proactive rebalance of the objects as the disks start hitting certain thresholds (80%). There are instances, during failures/rebuilds, or even when organic imbalance is discovered, where administrators may trigger a proactive rebalance task.

What happens

Once you click on the “balance disks” button. You are opening a 24-hr window where rebalance will take place. This means that the rebalance operation may take up to 24 hours, so be patient. Many people have voiced frustration because the UI shows a 5% progress (or lack there of) for a very long time, almost appearing as it is stuck. The rebalance is taking place on the background.

You may also not see any progress at all for the first 30 minutes. This is because VSAN wants to wait to make sure that the imbalance persists before it attempts to move any objects around. After all, the rebalance task is moving objects between disks/nodes, so copying data over the network will take resources, bandwidth and time; so plan accordingly if you must rebalance.

Background Tasks:

  • Task at 1 percent when created.
  • Task at 5 percent when rebalance command is triggered.
  • Then waits for the rebalance to complete before setting the percent done to 100.
    • During the waiting period, it will check to see if rebalance is done (clom-tool command).
    • If not done, it will sleep for 100 seconds and check again if rebalance is done.

By default when triggered from the VC UI, the task will run for 24 hours or whenever the rebalance effort is done, whichever comes first.

Notice that if your disks are balanced, the button is greyed out to avoid unnecessary object “shuffling”.

rebalance