NSX Advanced Load Balancer configuration for vSphere with Tanzu

One of the pre-requisites for vSphere with Tanzu is the ability to provide a Load Balancer to the environment. THe options as of vSphere 7.0 U1 included NSX or HAProxy appliance. In vSphere 7.0 Update 2, a new option for load balancer is available for the deployment of Workload Management. The NSX Advanced Load Balancer (ALB) also known as AVI, is available for download in OVA format from my.vmware.com. Deploying ALB will allow for communication between users, service engines, load balancer type services with supervisor and TKG clusters.

Let’s jump into it. First download the OVA for NSX Advanced Load Balancer from my.vmware site.

Once the OVA has been downloaded, proceed to your vCenter and deploy the OVA by supplying a management IP address.

Supplying a sysadmin login authentication key is not required.

Once the appliance has been deployed and powered on, login to the UI using the supplied management IP/FQDN.

Create username and password. Email is optional.

Add DNS , NTP, and backup passphrase information.

If you provided email address, you would also need to provide email settings.

You will also need to identify which Orchestrator Integration will be used. Select VMware vCenter/vSphere.

The appliance needs to know how to connect to your vCenter. Supply the username, password and vCenter information so that ALB can connect to vCenter. For permissions, you can leave “Write” selected, as this will allow for easier deployment and automation between ALB and vCenter. Leave SDN Integration set to “None”.

Select the Management PortGroup, IP subnet, gateway, and IP address pool to be utilized. This IP Pool is a range of IP to be used to connect the Service Engine (SE) VMs.

After the initial configuration, we will need to either import a certificate or create a self-signed certificate to be used in Supervisor cluster communication. For PoCs, creating a self-signed certificate is perfectly fine.

Log in to the appliance using the user/password combination specified during initial deployment.

Navigate to Administration by selecting this option from the drop-down menu on the upper left corner.

In the administration pane, select Settings.

Click on the caret under SSL/TLS Certificate to expand the options. Click on “Create Certificate” green box.

Create a self-signed certificate by providing the required information. Make sure to add Subject Alternate Name(s).

The next step is to configure the data network. Change from the administration tab to Infrastructure by selecting the option from the drop-down menu on the upper left corner.

From the Infrastructure pane, click on Networks. ALB will talk to vCenter and retrieve available networks.

Ideally, you’ll want to create a separate PortGroup for your data network. In this example, this is the Frontend network. The Load Balancer VIPs will be assigned to Service Engines from this network. These VIPs are the interfaces that connect users to clusters and Kubernetes services.

Select the Port Group for the data network, specify an IP address, and add a static IP address pool to be used.

Because service engines (SE) don’t have interfaces on the workload network that Kubernetes cluster nodes are connected to, we need to specify a route. Note: VIPs are on a network accessible to the client, but the workload network may not be, so these routes enable traffic to get to the clusters.

To add a route, Navigate to Infrastructure > Routing > Create

In this example, the is the Workload Network, and the next hop is the gateway for the VIP network.

Next, we need to create an IPAM Profile. This is needed to tell the controller to use the Frontend network to allocate VIPs via IPAM.

Navigate to Templates > Profiles > IPAM/DNS Profiles > create

Via IPAM profile, change the cloud for usable network to Default-Cloud, and set the usable network to the VIP network, in this case vDS-WCP-Frontend.

At this point your NSX Advanced Load Balancer Appliance is ready to be used during the configuration of Workload Management.

Dave Morera

VCF Cloud Builder: REUSE

When deploying VMware Cloud Foundation (VCF), we deploy an appliance (ova) called Cloud Builder. This appliance allow us to load the parameter file an automate the deployment of the entire infrastructure, allowing us to go from spreadsheet to full SDDC.

Cloud Builder also includes the VMware Imaging Appliance that you can leverage to build your ESXi hosts. Just navigate to https:// Cloud_Builder_VM_IP:8445/via and login with the admin credentials you set during Cloud Builder ova deployment. There are some documents out there stating to use root and EvoSddc!2016 as password but that most likely not work, at least not on newer versions available. More on VIA here.

Anyway, once the bringup of VCF has been completed, Cloud Builder will most likely not be used again, you can safely remove it. However, if you plan to deploy other VCF environments or, like me, is constantly deploying VCF on lab environments you can “refresh/reuse” Cloud Builder via a couple of options… See what I did there?!? via … like the imaging appliance.. never mind.

So the first option is to simply take a snapshot of Cloud Builder after the initial deployment of the ova and BEFORE you login to it and start importing the parameter file. Once you are done using Cloud Builder, you can simply revert the snapshot back and you can reuse the appliance again.

If you prefer not to take the snapshot or forgot to do it at the beginning, no worries. We can run a command within Cloud builder to remove previous entries and REUSE Cloud Builder.

  • su –
  • psql –host=localhost -U postgres -d bringup
  • delete from execution;
  • delete from “Resource”;
  • \q

Disclaimer: this is most likely not officially supported and hacking at the db is done at your own risk. Since this is Cloud Builder the risk is low and you can always deploy a new appliance if you mess it up, but still be careful.

That’s it, this will remove the previous entries and next time you log in to Cloud Builder it will feel like you just deployed the appliance. Meaning that you have to go through the initial prompts, such as accepting the EULA, selecting the platform to be used, etc.


vSAN Encryption at Rest & In Transit: What is the difference?

In the past, I’ve written a few posts about vSAN Data-at-Rest Encryption, which became available with vSAN 6.6. You can find those posts here. In vSAN version 7.0U1 there is a new option for encryption, Data-In- Transit Encryption. So what is the difference? Can I only choose one or both? Let’s find out.

vSAN Data at Rest Encryption

Data-at-rest (D@RE) was designed to do just that. Encrypt all your data once it lands on the disks being used by vSAN. This will work regardless the Storage Policy you choose, and all the data replicas will be encrypted at both the cache layer and the capacity layer. One major advantage of Data-at-Rest Encryption over the vSphere VM encryption is that vSAN will still allow you to encrypt your data and take advantage of space saving features such as deduplication and compression. When the data lands in cache it will be encrypted using the Data Encryption Key (DEK), then while the data is being destaged to the capacity layer it will be decrypted, and it is here where the deduplication and compression takes place. Finally when the data lands in the capacity devices, the data gets encrypted once again. It is also important to highlight that the DEK is protected by the Key Encryption Key (KEK) which is coming from the Key Management Server (KMS)… and this is one of the differences between the two options.

vSAN Data in Transit Encryption

Data-In-Transit Encryption (DIT) comes in to complete the end-to-end encryption of the data while in transit between hosts. Data-at-Rest encryption only encrypts the data when it lands on disk, so if someone takes a disk out of a server, all data is encrypted. But what about other attacks such as Man-in-the-middle attacks? Well, this is where Data-In-Transit encryption can protect the data. The keys used for DIT encryption are managed internally and there is no need for a KMS. Such keys are also rotated much, much faster when compared with D@RE. DIT encryption keys are rotated weekly by default, but you can change this option and rotate keys either every 7 days or every 6 hours or something in between. Just like D@RE encryption, DIT encryption works at a vSAN cluster level; so either all the hosts are protected or none.

Here is a quick comparison between the two options


Can I enable both at the same time?

Yes. You can enable Data at rest and Data in Transit encryption in order to get full protection in your vSAN environment. It is recommended to enable vSAN Data at Rest encryption in the early stages of the cluster to minimize the time for on-disk formatting as there is less data to move around.

What is the performance impact of turning encryption on?

There are a lot of variables that come in to play when we talk about performance. However; vSAN encryption (both) will take advantage of AES-NI and offload operations in order to reduce any performance hit. Most modern CPU have AES-NI, but sometimes this feature is not enabled, so make sure to check this at deployment. Please also be mindful that enabled D@RE when the cluster has a lot data in it will result in large amounts of data being moved, so plan this to be done during off hours if possible.

What vSAN License do I need to enable vSAN Encryption?

In order to enable Data-at-Rest and/or Data-In-Transit Encryption you will need vSAN Enterprise or vSAN Enterprise Plus licenses. Refer to licensing guide here.

How do I enable Data-In-Transit Encryption?

Enabling DIT encryption is easy. Within the vCenter UI, select the vSAN cluster > Configure > Services > Data-In-Transit can be enable with or without Data-at-Rest encryption. Here is where you can also change the key rotation schedule for the DIT encryption keys.


vSAN Encryption KMS info retrieval

A few years ago I wrote a blog post about “Replacing vCenter with vSAN Encryption Enabled“. For this particular exercise, one key piece of information needed to be retrieved was the kmipClusterId.

A couple of things have changed since then, in newer version of vSAN.

Change #1: ESXCLI commands

An easier way to retrieve this information with esxcli command was added. This command allows you to obtain a lot of information about the state of vSAN encryption, retrieve the hostKeyId, kekID, etc.

esxcli vsan encryption <option> get/list


So, based on this addition, you can now get the kmipClusterId needed for vCenter replacement by using esxcli vsan encryption kms list

As you can see, you can still look for this information on the esx.conf file which is where the hosts store this information for this particular version of vSAN (6.7 P01 – Build 15160138). Which brings me to the second update…


Change #2: vSAN Persistence

In vSAN 7.0 and beyond some changes were made on how this configuration gets stored. In this case, the encryption information that was previously file based (esx.conf) is now stored in a database. This provides better concurrency for multiple readers and writers versus the file based esx.conf option, among other advantages.

The good news is that the esxcli vsan encryption command will still allow you to retrieve the information needed in regards to encryption. However, if you attempt to retrieve this information from the esx.conf file, you won’t be able to find it there anymore.

Alternatively, you can retrieve the information directly from the config-store… maybe more info than you need. So, I’ld just stick to esxcli commands.

VCF Lab Tips: NSX Cluster size

VMware Cloud Foundation (VCF) is quickly becoming the go-to for many companies. The operations efficiency it brings, along with its best practices driven architecture is a no-brainer when it comes to value. As with any purchases, many people like to kick the tires on a new product, or just want to get familiar with it via Proof of Concepts, virtual labs, home labs, etc. Testing VCF is a great way to learn it, but because it uses best practices (VMware Validated Designs), some decisions are made for you, one of them is the NSX Cluster.

To make matters simple, I will refer to VCF 4.0+, where NSX-T is used for Management and VI Workload Domains… no more NSX-V. To deploy VCF we use a worksheet we can download from my.vmware.com 

This worksheet will deploy 3 NSX-T Managers and create a cluster under a Virtual IP (VIP). The NSX-T Managers are “t-shirt” sizes, by default deploying Medium NSX-T Appliances, but they can be changed to either Large or Small on the worksheet.


As you can see from the worksheet, it requires 3 NSX-T Managers to be deployed. So here is where we can use other avenues to reduce that resource consumption.

TIP 1:

If you wish to deploy all 3 NSX-T Managers, you can change the size to small on the worksheet in order to reduce the resource footprint, prior to VCF bring-up.


TIP 2:

This second option allows for setting the size to small and at the same time allows to also create a single node cluster. This can be done by using a json file during VCF bring-up rather than using the worksheet. Within the json file, remove any additional entries of NSX-T Managers and leave only one node.

For additional information on how to obtain the json file, you can find the procedure here.


TIP 3:

Another option relates to a post bring-up procedure. In the case that VCF has already been stood up, and resources want to be minimized within the lab, the option to remove nodes from the NSX Cluster would be a viable solution. Removing nodes from the NSX cluster can be done from CLI within the NSX cluster.

It is necessary to SSH into one of the cluster nodes in order to remove nodes from the NSX cluster. If unable to SSH, verify that the AllowRootAccess is enabled and StrictMode is set to no. Then restart the ssh service with the following command:

/etc/init.d/ssh restart

Then ssh into that node using the admin account. Once logged in, there are a list of command available, including get and detach.


Use the GET command to get the ID of the cluster nodes.

get cluster status


Use the ID along with the detach cmd to remove a specific node. Repeat the process to remove the the second node until there is only one left.

detach node <node-id>


I want to reiterate that this is a good resource saving workaround on a LAB environment. For production environments, please follow the already applied recommendations/best practices for deploying VCF.