ESXi Hosts Timing Out During HA Cluster Election

[Guest Post by Jeremy Reiman]

Summary:
  • ESXi hosts timing out during HA cluster election phase after cluster master is selected.  The HA Agent status in vCenter shows as unreachable on all hosts that timed out.

Symptoms:
  • ESXi host fails to enable HA Agent and shows error "operation timed out".
  • Error message "[ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip" showing in /var/log/fdm.log on ESXi hosts.
  • TCPdump capture from ESXi host shows packets destined for IP address of other ESXi host are being sent to the MAC address of the firewall.  These should be going to the MAC address of the ESXi host management interface since both reside on the same VLAN.

Configuration Info:
  • ESXi host managment interfaces are on the same VLAN.
  • ESXi 4.1 +
  • Firewall is a Cisco ASA5500 running IOS 8.2(2).
  • Firewall Switch Module running 3.2(5) is also applicable.
  • All network ports are open on the firewall between the vCenter server and the ESXi hosts.
Resolution:
  • Disable ProxyARP on the ESXi host management VLAN.  The Cisco ASA5500 command to disable proxyarp on a VLAN is “sysopt noproxyarp <vlan_interface_name>”.

Logical Configuration Diagram Example:

Port Usage Details:

No comments: