VCF Lab Tips: NSX Cluster size

VMware Cloud Foundation (VCF) is quickly becoming the go-to for many companies. The operations efficiency it brings, along with its best practices driven architecture is a no-brainer when it comes to value. As with any purchases, many people like to kick the tires on a new product, or just want to get familiar with it via Proof of Concepts, virtual labs, home labs, etc. Testing VCF is a great way to learn it, but because it uses best practices (VMware Validated Designs), some decisions are made for you, one of them is the NSX Cluster.

To make matters simple, I will refer to VCF 4.0+, where NSX-T is used for Management and VI Workload Domains… no more NSX-V. To deploy VCF we use a worksheet we can download from my.vmware.com 

This worksheet will deploy 3 NSX-T Managers and create a cluster under a Virtual IP (VIP). The NSX-T Managers are “t-shirt” sizes, by default deploying Medium NSX-T Appliances, but they can be changed to either Large or Small on the worksheet.

 

As you can see from the worksheet, it requires 3 NSX-T Managers to be deployed. So here is where we can use other avenues to reduce that resource consumption.

TIP 1:

If you wish to deploy all 3 NSX-T Managers, you can change the size to small on the worksheet in order to reduce the resource footprint, prior to VCF bring-up.

 

TIP 2:

This second option allows for setting the size to small and at the same time allows to also create a single node cluster. This can be done by using a json file during VCF bring-up rather than using the worksheet. Within the json file, remove any additional entries of NSX-T Managers and leave only one node.

For additional information on how to obtain the json file, you can find the procedure here.

 

TIP 3:

Another option relates to a post bring-up procedure. In the case that VCF has already been stood up, and resources want to be minimized within the lab, the option to remove nodes from the NSX Cluster would be a viable solution. Removing nodes from the NSX cluster can be done from CLI within the NSX cluster.

It is necessary to SSH into one of the cluster nodes in order to remove nodes from the NSX cluster. If unable to SSH, verify that the AllowRootAccess is enabled and StrictMode is set to no. Then restart the ssh service with the following command:

/etc/init.d/ssh restart

Then ssh into that node using the admin account. Once logged in, there are a list of command available, including get and detach.

 

Use the GET command to get the ID of the cluster nodes.

get cluster status

 

Use the ID along with the detach cmd to remove a specific node. Repeat the process to remove the the second node until there is only one left.

detach node <node-id>

 

I want to reiterate that this is a good resource saving workaround on a LAB environment. For production environments, please follow the already applied recommendations/best practices for deploying VCF.

VCF: Generate JSON File from Excel Spreadsheet

Performing VCF bring-up includes “feeding” Cloud Builder with all the information needed to deploy all the components automagically for you, including vCenter, vSAN, NSX, SDDC Manager and configuring all to actually work together. Sounds too good to be true, but it is… it is true indeed.

I personally like to use json files as it gives me an easier way to replace IPs, passwords, etc, as well as change size and a number of components. More on that later…

Once you deploy the Cloud Builder appliance, you can use the completed worksheet or you can use a json file. Cloud Builder provides a way to convert your Excel spreadsheet into a json file via a python script.

There is an official document on this procedure, and probably a couple of blog posts out there; however, there was a recent move to Python3, so the syntax has changed a bit.

Here are the steps to generate the json file with python3:

  • Use a file transfer utility (WinSCP) to copy the Excel file to Cloud Builder.
    • Log in with Cloud Builder admin account
    • copy/past excel file from your computer to /home/admin

  • Copy the Excel file from /home/admin to /opt/vmware/sddc-support using sudo command to gain access to the destination, or switch to root (su)
    • sudo cp <file-name.xlsx> /opt/vmware/sddc-support

 

  • Change directory to /opt/vmware/sddc-support and verify the xlsx file was copied successfully

  • Then run the following command to generate the json file
    • sudo python3 -m cloud_admin_tools.JsonGenerator.JsonGenerator -t cloud_admin_tools/JsonGenerator/template -d vcf-public-ems -i <file-name-path.xlsx> 
    • you can use -h to see other flags available
    • you should see the json file being generated

 

 

 

 

 

  • Navigate to your output location or if you used the default, navigate to /opt/vmware/Resources/<design chosen (vcf-ems, vcf-public-ems)>
  • verify that you see the json file and you can then read it
  • cat vcf-public-ems.json
  • You may need to switch to root user to access the output directory.

 

  • You can also assign a output directory with the flag -o or –output
    • In this example I set the json file directory to be /home/admin/

 

  • You can now export the file and use it as the input file during Cloud Builder bring-up workflow

 

 

 

 

@GreatWhiteTec

 

 

Devices unavailable for vSAN

I’ve been experiencing this scenario quite a bit lately, so I figured I’ll write something about it, also helps me refresh my memory.

Lately I’ve been helping out with a lot of VMware Cloud Foundation Proof of Concepts (VCF POCs)… That’s a mouthful!

Upon inspection of the environment I am supposed to work on, I have found hosts that were once part of a vSAN cluster but were not properly cleaned up prior to ESXi rebuild.

The vSAN clean up process involves the following steps:

  1. Set host in Maintenance Mode
  2. Delete Disk Group(s)
    • This removes vSAN partitions as well
  3. Remove host from vSAN cluster
  4. Clean up network
  5. Remove from vCenter

As previously mentioned, I usually get called in after the vCenter is gone and hosts have been re-imaged. Now what?!?!

The steps to manually clean the hosts involve the following tasks:

Delete Disk Groups via CLI

Since we do not have access to vCenter in this case, we can delete the Disk Group via CLI using esxcli commands.

You can remove the disks by specifying the ssd cache device to remove that device and all backing capacity devices, or you can do it one device at a time by using the device’s uuid.

To remove the devices one a time you can use vdq -q command to list the devices and then use esxcli vsan storage remove -u <uuid>. I prefer the other method since it is a lot faster…

  • List devices by using vdq -i

  • Run esxcli vsan storage remove -s <cache device>
  • In this case we will run this command for naa.55cd2e404b66fcbb and naa.55cd2e404b4393b9

 

Now What?!?!

Now that we have deleted the disk groups, we need to make sure that the host doesn’t think it is still part of a vSAN cluster.

Remove Host from vSAN Cluster via CLI

  • Check vSAN cluster membership
    • esxcli vsan cluster get
  • If it belongs to a cluster, remove it
    • esxcli vsan cluster leave

Clean up Network configuration

Since the hosts were re-imaged, this should be all set to default, but in case this was not cleaned up, or was manually re-created, you can clean it up by resetting the configuration.

The following command resets the root, password, overrides all the network configuration changes, and reboots the host. Don’t do this unless you are planning to reset the host to default… You’ve been warned.

/bin/firmwareConfig.py –reset

 

At this point, you can use the cleaned host to either join a new vSAN cluster, commission a host in VCF SDDC Manager, or use it as part of the VCF bring up process for the Management Domain.

 

@GreatWhiteTec