vSAN: The cascade scenario that vSAN stretch cluster has issues with...
Summary: Basically while testing stretch cluster, we ran into strange failover behavior. The fact that it was not simply occuring. During this testing, we found a dirty little secret about stretch cluster failovers. One that makes me rethink if stretch clusters really is worth doing. Documented Failure Scenarios Details: All documented scenarios effectively deal w/ a 'single' type of failure. The problem is disasters/failures can be multi-faceted and cascading in some instances. Taking the Secondary Site Failure or Partitioned scenario and adding the 'cascading failure' to it and you end up in a whole world of trouble depending on the next 'failure'. Below effectively depicts the failure of the interconnect between the two sites. The problem this fails to take into account is that there are typically 3 things involved to this. The networking between the two sites The preferred site routers The secondary site routers So here is a slightly ...