vCenter Server Reduced Downtime Upgrade

I have seen some questions coming in about Reduce Downtime Upgrade features lately, so I figured I’d share some more information about this. This feature was introduced in vSphere 7.0 Update 3 and it provides a new way of doing migration based upgrades for vCenter servers.

Reduced Downtime Upgrade (RDU) simplifies the migration process and reduces downtime (as the name implies) for vCenter while the data is being moved/copied from the old vCenter to the new vCenter. So the only downtime happens when the services on the old vCenter are stopped and started on the new vCenter. The data is copied almost in a vMotion type of way. Pretty slick.

The main question I see is: Does this apply to all deployment types including On-Premises and Cloud deployments?

The answer is NO. This feature (as of right now) only applies to VMC on AWS and Project Arctic. So for now, RDU is not available/supported for on-premises deployments, but that’s not to say it will never be supported on-premises in the near future. Also RDU is only available via API at the moment, and for the VMC on AWS and Project Arctic use cases, the vCenter upgrade is done by VMware Site Reliability Engineers (SRE), so you as a customer don’t need to worry/trigger the upgrade/update of vCenter server. You can safely pass the burden on to the SREs. That alone can justify moving to VMware’s Project Arctic offering when available IMHO.

Hopefully this post answers some questions. For more information refer to the official blog post here.

VUM Host upgrade fails – Another task is already in progress

I came across this issue a few times already, so I figured it would be good to share the findings, especially since it took me down the rabbit hole.

First off, the error “Another task is already in progress” is kind of obvious but yet does not provide enough detail. I ran into this trying to upgrade 4 hosts in a cluster via VUM, no tasks were running at the time of the upgrade start. This was initiated from VMware Cloud Foundation. Thinking the issue was with VCF I tried to do a direct upgrade from VUM, skipping VCF… but received the same error. Also tested with/without vSAN. NFS, etc.

The issue appeared to be related to a race condition where both the VUM remediation AND this “other task” were trying to run at the same time on that host. From the task console we can see that there was another “install” operation running as soon as VUM was trying to remediate. The Initiator give me a hint, and we can see vcHms “cut in line” a did an install operation.

I started looking into the esxupdate log (/var/log/esxupdate.log), and noticed that the host was trying to install a vib. But this vib was already installed, so what gives?

The name of the initiator and one of the vibs gave me a clue (vr2c-firewall.vib). I was running vSphere Replication on that particular cluster, so I started digging in. To validate my suspicion, I shutdown the VR appliance and attempted to upgrade one host. The upgrade worked as expected with no errors, so I was pretty certain something within vSphere Replication was causing this.

First I needed to ssh into the VR appliance or use the console. You won’t be able to ssh into the appliance before enabling ssh. Log in to the console with the root password and run the script to enable ssh.

/usr/bin/enable-sshd.sh

Within the appliance I looked into the config settings for HMS and discovered that there were 2 vibs that were set to auto install at host reconnect. So it appears there is a race condition when VUM and VR and both trying to do an install task at the same time (at reconnect). Makes perfect sense now.

VR_vib_autoinstall

The ESXI update logs indicated that the vr2c-firewall.vib was the one trying to install. After re-checking the vibs (esxcli software vib list) on all the hosts, I did see this vib was already installed but the task from VR kept trying to install at reconnect.

As I workaround, I decided to disable the auto install of this particular vib by running the following command within the /opt/vmware/hms/bin directory and then restart the hms service:

./hms-configtool -cmd reconfig -property hms-auto-install-vr2c-vib=false

service hms restart

hms-configtool

This workaround worked as a charm and I was able to upgrade the rest of the cluster using VUM. I did not find an official KB about this, and this is by no means an official workaround/fix.

Disclaimer: If you plan to implement this fix, be aware that this is not an official VMware blog, and changes to products may or may not cause issues and/or affect support.

vSAN 6.7 Upgrade Considerations

On April 17, 2018, VMware released vSphere 6.7. This includes vCenter, ESXi, and of course vSAN. A lot of people are looking to upgrade in order to take advantage of the new features, and enhancements; primarily the H5 client… good bye flash client! Links to more info on what’s new for vSphere and vSAN.

From an HMTL 5 client perspective, the feature parity is about 95%. For vSAN alone, it is about 99%, as we are only missing Configuration Assist, which is still available via the flash client.

I see a lot of people getting confused, and still believe that vSAN is a separate upgrade, just like traditional storage. Fortunately, vSAN is in the kernel, so once you upgrade ESXi you have also upgraded vSAN. Boom!!! Even though the version numbers may not be exact between ESXi and vSAN, they still go hand-in-hand. With that, it is important to understand the steps necessary for a vSAN upgrade.

Based on the nature of vSAN, we need to follow the vSphere upgrade path. This includes checking the VMware Product Interoperability Matrices, not only with your current versions against the versions you are going to upgrade to, but also all the other VMware products such as NSX, SRM, vROps, etc.

 

Upgrade Process Overview

From an upgrade process perspective, you have options. You can migrate your Windows vCenter to the vCenter Appliance (recommended). If you already have the vCenter Appliance, you can either do an in-place upgrade, or create a new vCenter if you want to migrate your hosts over to a fresh new vCenter install. Here is more info on vSphere upgrades.

  1. Upgrade PSC
  2. Upgrade, Migrate, or deploy new vCenter
    1. Depends on current version
  3. Upgrade ESXi
    1. This will also upgrade vSAN (easy, right?)
  4. Upgrade vSAN on-disk Format (ODF)
  5. Upgrade VMware tools
  6. Upgrade vDS

 

As previously discussed, you will need to check the Product Interoperability Matrix to make sure all your products can run on vSphere 6.7. Don’t shoot from the hip, and start upgrading before you have done proper research.

 

I mentioned the choice of migrating hosts to a new vCenter. This is something I do quite often in my labs, and it is a simple process.

Migration Process Overview

  1. Export vDS configuration (including port groups)
  2. Copy licenses from old vCenter
  3. Configure new vCenter
  4. Create Datacenter in vCenter
  5. Create a Cluster and enable vSAN on it
    1. If you currently have other services enabled, they will have to be enabled on the new vCenter as well prior to moving the hosts.
  6. Add licenses
    1. Assign license to vCenter
    2. Assign vSAN license to cluster asset
  7. Create vDS on the new vCenter
  8. Import configuration and port groups to new vCenter
  9. On the new vCenter, add hosts
    1. No need to disconnect hosts on the old vCenter, they will disconnect after connecting to the new vCenter.
    2. Confirm ESXi license or assign a new one.
  10. Connect the hosts to the vDS (imported)
    1. Make sure you go through and verify assignment of uplinks, failover order, vmkernel ports, etc.
  11. Lastly, you will need to tell vSphere that this new vCenter is now authoritative
    1. You will get an alert about this

 

vSAN Perfomance
HTML 5 Client

HTML 5 – vSphere and ESXi Host Web Clients

H5The wait is over (almost). Since the introduction of vSphere Web Client, many admins have slowed down the adoption of the Web Client as well as updates to vSphere due to the performance of said client.

VMware has released a couple of flings in relation to this problem. One of them was the host web client, where you can manage your hosts directly without the need to install the vSphere client. This fling is now part of the latest update to vSphere 6.0 U2. A few days ago, VMware released a similar option for vCenter. Both of these options are based on HTML 5 and javascript.

Host Web Client

Like I mentioned before, starting with vSphere 6.0 U2, the host web client is already embedded into vSphere. If you do not have this update you can still download the OVA and access the host web client that way. Currently it only works if you have vSphere 6.0+ but once version 5.5 U3 is released, it will also work with that version. Here is a link to download the fling.

To access the web client, you will need to add “/ui” at the end of the name/ip address of your host. For example https://<host-name-or-IP>/ui

The client is very responsive and has a nice UI. Not all the features are currently supported, but more will be coming at some point in the near future.

host_ui

 

vCenter Web Client

This HTML web client is only available as a fling at the moment. You will need to deploy an OVA and register the appliance with the vCenter that you would like to manage. Being a fling, not all features are included. It basically focuses on VM management, but I am sure they are working to port all the features over at some point (I hope).

To deploy this ova, you will need to enable SSH and Bash Shell on your VCSA. You can do both from the VCSA web UI. If you are running Windows based vCenter refer to the Fling documentation here.

vcsa_uI-shell

Prior to going through the configuration you will need to

  1. Create an IP Pool (If deploying via C# Client)
    • Note: I deployed using Web Client and didn’t create the IP Pool for me automatically as it is supposed to, so double check you have an IP Pool before powering on the appliance
  2. Deploy the OVA

IP_Pool

After deploying the OVA, creating an IP Pool, and enabling both SSH and Bash Shell on VCSA, it is time to configure the appliance.

  • SSH to the IP address you gave to the appliance using root as the user and demova as the password
  • Type shell to access Bash Shell
  • run the following command in Bash Shell
    • /etc/init.d/vsphere-client configure –start yes –user root –vc <FQDN or IP of vCenter> –ntp <FQDN or IP of NTP server>
  • If you need to change the default password for your root account, you can run the following command from bash shell
    • /usr/bin/chsh -s “/bin/bash” root
  • answer the question by answering YES
  • and enter the credentials for your vCenter


H5_deploy1

H5_deploy2

 

The HTML Web Client is pretty awesome, I gotta say, even if not all the features are there yet. It is super clean, and responsive. I can’t wait for it to be embedded with a full feature set.

 

H5_1

H5_2

Troubleshooting vSphere PSOD

VMware_PSOD The Screen of Death, as most of us know it as, is the result of a system crash. Windows has his famous Blue Screen of Death (BSOD), and VMware has a purple screen of death (PSOD). Of course there is also a Black Screen of Death, which is usually when Windows systems are missing a boot file or one or more of those files have become corrupted. Although there is a range of colors, the problem for many is How do I fix this? How do I know what caused this?

Many admins start with the obvious and simply reboot the machine hoping it was a hiccup, but chances are, there is a bigger problem going on that needs addressed. In VMware, just like other systems, a core dump file is created when the stop error is generated. This is where you start digging…

Where is my DUMP…file?!?

So, during the purple screen, the host is writing the dump file to a previously created partition called VMKcore. There is a chance that the core dump file won’t be written due to internal problems, so it is always a good idea to take a screen shot of the PSOD. Exporting the core dump file can be done via CLI, manually from vCenter path for both Windows and/or appliance, as well as vSphere Client and WebClient; which is the preferred method from most admin since it is so simple to do.

To export the logs from vSphere Web Client, use the following steps:

  • Open vSphere Web Client > Hosts & Clusters > Right click on vCenter > Export System Logs…

Sys_Logs

  • Choose the host that had the PSOD > Next

Sys_Logs_ESXi

  • Make sure you select CrashDumps, all others are optional

Sys_Logs_CrashDump

 

Once you have the dump file (vmkernel-zdump….), its time to look for the needle in the haystack. There are a lot of entries, and this file can be overwhelming to many people, but don’t stress, it is quite simple to find it. The first logical step is to find the crash entry point You can use the time when you noticed the PSOD or you can simply search within the log file for “@bluescreen”.

Find_@Bluescreen

Once you find this, you will see the exact cause for the PSOD. In the screenshot below, you can see that the error generated is in relation to E1000. You should automatically think vNIC/Drivers, as well as looking online for any VMware KB articles regarding the errors generated. In this case, there is a known issue for different versions of vSphere that have already been patched; so keeping up to date on patches is very important.

E1000_PSOD

 

The issue that triggered the PSOD in this environment was related to updates (fix) not being applied. The work around was to not use E1000e NIC on the VM but rather VMXNET3. Also, you HAVE to install the VMTools on your VMs. The VMTools have drivers needed for your VM to work properly. In this particular instance, VMTools were not installed on the VM. Once the tools were installed and the vNIC was switch to VMXNET3, the issue was resolved.

 

Refer to VMware’s KB2059053 for more info.