Troubleshooting vSphere PSOD

VMware_PSOD The Screen of Death, as most of us know it as, is the result of a system crash. Windows has his famous Blue Screen of Death (BSOD), and VMware has a purple screen of death (PSOD). Of course there is also a Black Screen of Death, which is usually when Windows systems are missing a boot file or one or more of those files have become corrupted. Although there is a range of colors, the problem for many is How do I fix this? How do I know what caused this?

Many admins start with the obvious and simply reboot the machine hoping it was a hiccup, but chances are, there is a bigger problem going on that needs addressed. In VMware, just like other systems, a core dump file is created when the stop error is generated. This is where you start digging…

Where is my DUMP…file?!?

So, during the purple screen, the host is writing the dump file to a previously created partition called VMKcore. There is a chance that the core dump file won’t be written due to internal problems, so it is always a good idea to take a screen shot of the PSOD. Exporting the core dump file can be done via CLI, manually from vCenter path for both Windows and/or appliance, as well as vSphere Client and WebClient; which is the preferred method from most admin since it is so simple to do.

To export the logs from vSphere Web Client, use the following steps:

  • Open vSphere Web Client > Hosts & Clusters > Right click on vCenter > Export System Logs…

Sys_Logs

  • Choose the host that had the PSOD > Next

Sys_Logs_ESXi

  • Make sure you select CrashDumps, all others are optional

Sys_Logs_CrashDump

 

Once you have the dump file (vmkernel-zdump….), its time to look for the needle in the haystack. There are a lot of entries, and this file can be overwhelming to many people, but don’t stress, it is quite simple to find it. The first logical step is to find the crash entry point You can use the time when you noticed the PSOD or you can simply search within the log file for “@bluescreen”.

Find_@Bluescreen

Once you find this, you will see the exact cause for the PSOD. In the screenshot below, you can see that the error generated is in relation to E1000. You should automatically think vNIC/Drivers, as well as looking online for any VMware KB articles regarding the errors generated. In this case, there is a known issue for different versions of vSphere that have already been patched; so keeping up to date on patches is very important.

E1000_PSOD

 

The issue that triggered the PSOD in this environment was related to updates (fix) not being applied. The work around was to not use E1000e NIC on the VM but rather VMXNET3. Also, you HAVE to install the VMTools on your VMs. The VMTools have drivers needed for your VM to work properly. In this particular instance, VMTools were not installed on the VM. Once the tools were installed and the vNIC was switch to VMXNET3, the issue was resolved.

 

Refer to VMware’s KB2059053 for more info.

Deploying VCSA 6.0: Mind the Gap

VMware’s VCSA 6.0 brings a lot of enhancements compared to previous versions. I would seriously consider deploying VCSA in a production environment in order to replace the Windows flavor. For those not familiar with VCSA, this is the virtual appliance option to deploy vCenter in an environment. It reduces the time needed to deploy vCenter and offers an integrated database for no additional cost. Although this post may not be entirely technical, it will allow you to be aware of possible constraints that will prevent you from deploying VCSA before you invest too much time on it.

One of the great things about deploying VCSA over the Windows vCenter is that you will reduce the cost by not deploying a Windows VM as well as having to purchase an MSSQL license. VCSA sounds great so far, but there are some gaps that you need to be aware of before deploying this in an environment.

 

VCSA_mind_the_gap

Some of the shortcomings of VCSA are primarily related to its nature of not being a Windows VM. For some deployments Windows vCenters have been used to also host the VUM (Update Manager) components, as well as programs that provide additional capabilities to the virtual environment such as VSC for NetApp storage, among others. This means that you would still need to deploy a Windows VM to host VUM as well as VSC in this case. Even though you would still be deploying such VM, the need for a MSSQL server/instance is not required which translates in savings.

Another aspect to keep in mind is the installation and migration from previous versions. There is no in-place upgrade from previous versions, but migrations are possible. With this in mind, you may want to consider to just start with a new, fresh environment. I would. Same applies to the Windows flavor. The installation method now comes as an ISO image. This may cause some confusion. In order to deploy VCSA, the ISO is mounted from a Windows system (can be your computer) and installation can be done remotely.

Before installation, make sure you install the Client Integration Plugin located within the ISO under the vcsa folder.

VCSA_CIP

 

 

 

Start the installation by launching the vcsa-setup.html file from the ISO. A Web UI opens up after a few seconds, and gives you the option to install and ‘Upgrade” (migrate). During installation, just provide the target Host information, and the rest of the information needed for the installation. Make sure the VCSA appliance has a proper network connection and you can reach it from the computer deploying the appliance.

vcsa_setupvcsa_UI

 

 

 

 

 

 

 

 

Both Windows and appliance vCenter offerings have the same scalability numbers as it relates to hosts, VMs, clusters, etc.

In conclusion, VCSA is a great choice for vCenter, but just be aware of some of the constraints of not using the Windows option. By the way the Web UI in vSphere 6 is soooo much faster!!! I’m just saying.

 

vSphere 6 Availability Enhancements

With the introduction of vSphere 6, many new enhancements have been introduced. Given that IT is primarily delivered as a service within a business, the availability of our environment is often high priority. This new version of vSphere introduces the following enhancements:

  • Better vMotion Capabilities
  • Multi-Processor Fault Tolerance (FT) (up to 4 vCPUs)
  • App HA now supports more applications
  • vSphere Replication has better RPO (15 minutes) and scalability (2000 VMs)

There are other availability enhancements in vSphere 6, but the previous list really called my attention. Specifically the vMotion capabilities. In previous versions, moving VMs between vCenters was a little cumbersome and required a lot of manual intervention such as scripts or even down time. Such capability is now possible with vSphere where VMs can be moved not only across datacenters, but also across long distances (greater than 100ms round trip time. It is now possible to perform vMotion tasks across virtual switches. However, it is important to understand that the vCenters have to be part of the same SSO domain for this to work.

What does all this mean to me? Well, in my opinion, these enhancements can be extremely handy for disaster prevention exercises. Take a scenario where there is an advanced notice about a hurricane, or flood. Let’s assume that that a stretched VLAN or VXLAN has been configured across 2 data centers with a reasonable rtt (about 100 ms or less). In this case, the option exists to move some powered-on VMs to another vCenter within the same subnet in order to prevent down time for the business. Of course, this can also be accomplished by SRM if already implemented.

These enhancements as well as the ones in the network, managements, and storage realms makes vSphere 6 impossible to ignore, and set VMware apart from its competitors.

vSphere 6 Web Client: Yes, Let’s go there…

Since the introduction of vSphere 5.1, VMware introduced the new Web Client. Yes, there was another web client out there, but it was not widely used. A lot of people questioned the change towards a web interface, so here are many reasons for the Web Client:

  • Access from any device with Web access
  • No need to install binaries in multiple locations to access the vSphere environment
  • Multi OS friendly
  • Scalable solution
  • API friendly

This first version was well received by many, but others noticed some slow response within the browsers. Well, I am happy to say that the new Web Client in vSphere is anything but slow. I know for a fact that the VMware team has spent countless hours working to get the slow response issue resolved. I was privileged to be part of a private customer Alpha test for vSphere x.y , and the difference made since the Alpha up until Beta 2 has been tremendous. I had the chance to voice concerns in many areas and obviously the Web Client was one of them, and let me tell you, VMware listens very well and does whatever needs to be done to make customers happy.

I will list some of the changes to the Web Client that I believe most customers will REALLY like.

  • Fast response times for Web Client interaction
    • Very noticeable
  • Faster log on process
  • Browser Friendly
    • Previous version had best results using Google Chrome
  • Recent Tasks (at bottom) is back
  • Drop down menu from home icon for easy, 1-click navigation
  • Core items added to left pane (Networking, Storage, VMs, Hosts)
  • vCenter Inventory Lists
  • 1-click task filtering

 

These are some of many improvements in the new vSphere release that will satisfy the requests of many customers. I was extremely impressed about the speed of the Web Client, but the additional features are icing on the cake.

As you may infer, the “fat client” will play a small to non-existent role moving forward. The C# client may still be used to access the individual hosts, as well as having read only capabilities for objects with virtual hardware version 9 and above, but vCenter tasks will be have to be done through the new an improved Web Client. Based on the huge improvements and new features, I don’t think many people will miss the old client.

Web_Client

VVols: All Systems Go

After a long wait and development/marketing effort from VMware, VVols are finally ready to take over your datacenter(s).

VVols are the next generation, integration between vSphere and storage arrays. VVols leverage a new set of APIs (VASA) that allows vSphere to communicate with the array and provide additional features at the VM level. VVols are based on storage policies, which in turn allows for further automation between products.

This storage abstraction provided by VVols, allows for the control of storage, not only at the VM level but also at the VDMK level. This is a great feature, as now you can control VMDKs as separate entities. The connections between the hosts and VVols are done through an abstraction layer known as Protocol Endpoints, which provides the user the freedom to use several protocols at once such as FC, iSCSI, or NFS.

There are a few requirements for VVols. One of them is that the array vendor can support VVols. The APIs from the vendor (VASA), as well as other vendor requirements. In the case of a storage array vendor such as NetApp, VSC is also required.

The Policy-Based Provisioning provided by VVols brings us even closer to the Software Defined Data Center (SDDC)

 

VVOLS