After three or four days of trying to get FT working in a nested vSphere 5.0 VM, I am reaching out to the community for assistance. My efforts to enable FT on my nested vSphere VMs has moved from logical troubleshooting progression to unstructured thrashing.
The physical vSphere server uses a Xeon E5-2620 (Sandy Bridge) CPU and has 32GB RAM. The VMs are located on a separate physical iSCSI storage server. vSphere 5.0 U2 is used for physical and nested vSphere installations. The nested vSphere VMs are configured with two vCPUs, 6 GB RAM, and four 1000 NICs. The physical and virtual vSphere servers are joined to a VDS, where port groups and vmkernel interfaces are used to support vSphere management, two interface multipath iSCSI, and Fault Tolerance.
I followed William Lam’s nested vSphere 5.0 and nested EVC cluster documents when creating the nested vSphere virtual machines and prepping the physical vSphere host. The physical host /etc/vmware/config file was modified to include the vhv.allow = “TRUE” attribute and running esxcfg-info | grep “HV Support” returns a value of 3. The host server is a member of a Sandy Bridge EVC cluster which is also joined by the nested vSphere server.
The nested vSphere server successfully joins the EVC cluster, however the “Host Configured for FT” status is “No”. Clicking on the detail icon provides the following reason for FT support being inactive, “Host CPU does not support hardware virtualization which is required for replay”. Running esxcfg-info | grep “HV Support” from the nested vSphere server’s DCUI returns a value of 3.
Other than FT not being supported, the nested vSphere server functions as expected: external iSCSI storage is accessible, vMotion allows migration of VMs between the nested and physical vSphere servers, etc.
One possible symptom of the problem appears to be related to the CPUID masking for EVC support, again following William’s blog posting. Setting the CPUID causes the nested vSphere VM to power on using the Intel Merom (Core 2) EVC Mode. Curiously, the nested vSphere vCenter object shows the host cluster object running in Sandy Bridge EVC Mode. Rebooting the nested vSphere server after removing the CPUID masking and disabling EVC on the cluster, does not enable FT support. Attempting to enable EVC in this state fails for all EVC modes (as expected). The following are the CPUID Masks used, based on the EVC cluster CPUID mask, and applied to the nested vSphere VM per William’s article.
Leaf 1
eax 0000:0000:0000:0010:0000:0110:1010:0010
ebx
ecx 0001:0111:1001:1000:1110:0010:0011:1111
edx 1000:1111:1110:1011:1111:1011:1111:1111
Leaf 80000001
eax
ebx
ecx 0000:0000:0000:0000:0000:0000:0000:0001
edx 0010:1000:0001:0000:0000:1000:0000:0000
Leaf d
eax 0000:0000:0000:0000:0000:0000:0000:0111
ebx
ecx 0000:0000:0000:0000:0000:0011:0100:0000
edx 0000:0000:0000:0000:0000:0000:0000:0000
A new nested vSphere server was created to validate the modifications to the original nested vSphere. This server exhibited the same behavior and fault tolerance was inoperable. I would be grateful for any assistance that could be provided to overcome the issue(s) preventing FT on my nested vSphere VMs. Please let me know what additional information that I can provide to aid in the troubleshooting process.
Jonathan