VUM Host upgrade fails – Another task is already in progress

I came across this issue a few times already, so I figured it would be good to share the findings, especially since it took me down the rabbit hole.

First off, the error “Another task is already in progress” is kind of obvious but yet does not provide enough detail. I ran into this trying to upgrade 4 hosts in a cluster via VUM, no tasks were running at the time of the upgrade start. This was initiated from VMware Cloud Foundation. Thinking the issue was with VCF I tried to do a direct upgrade from VUM, skipping VCF… but received the same error. Also tested with/without vSAN. NFS, etc.

The issue appeared to be related to a race condition where both the VUM remediation AND this “other task” were trying to run at the same time on that host. From the task console we can see that there was another “install” operation running as soon as VUM was trying to remediate. The Initiator give me a hint, and we can see vcHms “cut in line” a did an install operation.

I started looking into the esxupdate log (/var/log/esxupdate.log), and noticed that the host was trying to install a vib. But this vib was already installed, so what gives?

The name of the initiator and one of the vibs gave me a clue (vr2c-firewall.vib). I was running vSphere Replication on that particular cluster, so I started digging in. To validate my suspicion, I shutdown the VR appliance and attempted to upgrade one host. The upgrade worked as expected with no errors, so I was pretty certain something within vSphere Replication was causing this.

First I needed to ssh into the VR appliance or use the console. You won’t be able to ssh into the appliance before enabling ssh. Log in to the console with the root password and run the script to enable ssh.

/usr/bin/enable-sshd.sh

Within the appliance I looked into the config settings for HMS and discovered that there were 2 vibs that were set to auto install at host reconnect. So it appears there is a race condition when VUM and VR and both trying to do an install task at the same time (at reconnect). Makes perfect sense now.

VR_vib_autoinstall

The ESXI update logs indicated that the vr2c-firewall.vib was the one trying to install. After re-checking the vibs (esxcli software vib list) on all the hosts, I did see this vib was already installed but the task from VR kept trying to install at reconnect.

As I workaround, I decided to disable the auto install of this particular vib by running the following command within the /opt/vmware/hms/bin directory and then restart the hms service:

./hms-configtool -cmd reconfig -property hms-auto-install-vr2c-vib=false

service hms restart

hms-configtool

This workaround worked as a charm and I was able to upgrade the rest of the cluster using VUM. I did not find an official KB about this, and this is by no means an official workaround/fix.

Disclaimer: If you plan to implement this fix, be aware that this is not an official VMware blog, and changes to products may or may not cause issues and/or affect support.