VMware Cloud Foundation for Cloud Providers – Lessons Learned

One of the tools I’ve been working with recently is VCF 4.x. As an Enterprise or Cloud Provider admin, VCF makes your life easier by providing a central point of management known as de SDDC Manager. Keeping all your clusters and its components updates becomes a trivial task.

Below I will share some valuable tips, best practices and lessons learned from real-life experience when working with VCF in a Cloud Provider environment. If you are starting with VCF or plan to deploy it in the future this guidance comes in handy. I will not go into details about initial VCF deployment (maybe in future posts) since there are several blogs around covering these topics already.

Hardware specs standardization.

It would be optimal if your hardware comes from the same manufacturer with similar specs for all the clusters managed by VCF. Mixing hardware might get challenging when upgrading vSphere. You can define a custom OEM ISO for the updates, but it will apply to all of the clusters. Alternatively, you can define a directory for Async Drivers and use the VCF bundle ISO for ESXi. To be honest, I haven’t tested my environment with a mix of hardware vendors.

VSAN requirements.

In case of you are repurposing hardware for your VCF deployment, the VSAN requirements apply only to the Management domain. You can deploy workload domains (tenants) with no VSAN, but using any centralized Storage via FC, NFS or iSCSI. If your budget is limited, this will work out on your side.

Regarding our Cloud Provider Architecture, we acquired four (4) physical hosts for the Management Domain and we repurposed existing hardware for the Workload Domains (Compute for Tenants). However, our Management hosts did not have local storage for VSAN. We compared the  specs against a VSAN Ready nodes for the same server type, then we decied to provision the missingb components: both, cache and capacity tier drives so we were ready to go!

If your VSAN cluster complies with the minimum requirements in capacity and you are limited in the amount of workloads you can deploy VSAN with no option of growing due to your hardware limitations, once the initial deployment is done, you can move some workloads or deploy new ones in legacy SAN-backed datastores, if needed.

Standard configuration and number of pNICs

This might be one of the first things that should be addressed when architecting and defining the proper setup. VCF only supports configurations limited to 2 pNICs when working from the GUI. This is acceptable in most of the cases but if your require three or more pNICS, then the VCF configuration and deployment need to be performed via API.

With the Management Domain deployment, you can define the amount of pNICs in the Primary vDS or deploy an optional Secondary vDS using the deployment workbook in the Host and Networks tab. Be aware that if you deploy the primary vDS with more than 2 pNICs, adding more hosts to the cluster needs to be performed via API. If you use the GUI, the setup task will fail and this task is not easy to get canceled, not even through the VCF API.


Management Domain vDS configuration Parameters

In our case, we deployed the Management domain with 4 pNICS. We opened up a SR (support ticket) directly with VMware right after we encountered an issue when upgrading vSphere. Long story short: We had to remove 1 host from the cluster and decommissioned it. After recommissioning the host back into VCF, we tried adding the host back into the cluster using the GUI, big mistake! We ignored a warning indicating that adding a new host supports only 2 pNICS. Task failed! Luckily for us, we had a snapshot ready to get recovered. We rolled back and performed the task through API defining 4 pNICS as defined with the original deployment. The task completed successfully this time.

Warnings are there for a reason

It would be nice to request a feature where adding hosts to a cluster with more than 2 pNICs could be performed via GUI for existing clusters. For now, you are stuck with the API. Not a deal breaker though. I found myself doing most of the management tasks through API nowadays.

Application Virtual Networks in the Management Domain Bring up

When deploying your Management Domain using the deployment workbook and the Cloud Builder, you have the option to deploy and configure Application Virtual Networks to leverage NSX-T features for your networks configurations or use traditional VLAN-backed networks. If no AVNs are deployed, you will have to use VLAN-Backed networks.

This is required to deploy the Lifecycle Manager, which is required to deploy vRLI, vIDM, vRA and vROPS locally and in a multi-region setup.

Screen that welcomes you when trying to deploy vRealize Suite if no AVNs were deployed during bring up

 Since AVNs require dynamic routing during bring up, you might think that using VLAN-backed networks will be easier. Please take this into consideration when removing any BGP requirement, be adviced: if you go this route, it will come back to haunt you in the future! Let me take you to the process of configuring vRLCM and VLAN-backed networks if you do not deploy AVNs during the bring up process. This is supported by VMware’s KB 78608 and KB 80864. Spoiler alert, you still need to deploy an NSX-T Edge cluster – See details below:

  1. Create an NSX-T Edge Cluster for the vRealize Suite Products by Using SDDC Manager.
  2. Create an NSX-T VLAN Transport Zone
  3. Add the VLAN Transport Zone to the Host Transport Nodes and Edge cluster nodes
  4. Create NSX-T Data Center VLAN Segments
  5. Modify configuration files in the SDDC Manager to point to the VLAN Networks

On top of that, you are required to configure the load balancing features for the x-region configurations which AVNs deployment does automatically during bring up. Not a big deal if you are planning to go with the NSX Advanced Load Balancer (formerly AVI).

Password Management.

When your setup is in place, Password Management for vSphere and NSX-T must be done through the SDDC Manager in Security > Password Management. Bear in mind that NSX-T accounts have an expiration and you need to decide if disabling the NSX-accounts password expiration or rotating password through the SDDC Manager is the way you want to go. The alter option is the recommended one when taking into consideration the security aspect of but be adviced: dealing with expired passwords might cause you some minor troubles if passwords that expire are not rotated in time.

Password Management through the SDDC Manager

NSX-T Post Deployment Configurations

After the initial bring up and when deploying new clusters with their own NSX-T Manager cluster, some NSX-T Configurations might not apply, and manual intervention might be required.

In our case, some configurations were missing after the NSX-T deployment for Management and workload domain. Note that this was with VCF 4.0 (probably fixed in upcoming versions). Some things we neeeded to adjust/config during the process:

  1. vSphere cluster did not have a Transport Node Profile associated and we had to assign one.
  2. TEP IP Pool did not exist, we had to create it and associate it to the Transport Node Profile.
  3. Uplink profile Teaming was kind of iffy. We adjusted it to our requirements.

These configurations might have not been applied correctly probably because we deployed our Management and VI Workload Domains using 4 pNICs. We already owned the hardware and we did needed to repurpose it accordingly. If you stick with the standard 2 pNICs configuration, you might not encounter these issues described above in detail.

Hoping  this blog entry has been informational and useful, make sure to take advantage of these tips above during your Journey with VCF+VCD. See you next time!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.