VSAN Proactive Rebalance

balance1There has been a lot of questions as to what happens when a rebalance task is triggered in VSAN. By default, VSAN will try to do a proactive rebalance of the objects as the disks start hitting certain thresholds (80%). There are instances, during failures/rebuilds, or even when organic imbalance is discovered, where administrators may trigger a proactive rebalance task.

What happens

Once you click on the “balance disks” button. You are opening a 24-hr window where rebalance will take place. This means that the rebalance operation may take up to 24 hours, so be patient. Many people have voiced frustration because the UI shows a 5% progress (or lack there of) for a very long time, almost appearing as it is stuck. The rebalance is taking place on the background.

You may also not see any progress at all for the first 30 minutes. This is because VSAN wants to wait to make sure that the imbalance persists before it attempts to move any objects around. After all, the rebalance task is moving objects between disks/nodes, so copying data over the network will take resources, bandwidth and time; so plan accordingly if you must rebalance.

Background Tasks:

  • Task at 1 percent when created.
  • Task at 5 percent when rebalance command is triggered.
  • Then waits for the rebalance to complete before setting the percent done to 100.
    • During the waiting period, it will check to see if rebalance is done (clom-tool command).
    • If not done, it will sleep for 100 seconds and check again if rebalance is done.

By default when triggered from the VC UI, the task will run for 24 hours or whenever the rebalance effort is done, whichever comes first.

Notice that if your disks are balanced, the button is greyed out to avoid unnecessary object “shuffling”.

rebalance