vSphere 5 has many new and exciting features. This post will concentrate on High Availability(HA) and how it affects blade designs. While HA is certainly not new, it has been rewritten from the ground up to be more scalable and flexible than ever. The old HA software was based on Automated Availability Manager (AAM) licensed from Legato. This is why HA had its own set of binaries and log files.
One of the problems with this “now legacy” software was the method it used to track the availability of host resources. HA prior to vSphere 5 used the concept of primary nodes. There were a maximum of (5) primary nodes per HA cluster. These nodes were chosen by an election process at boot time. The (5) primary nodes kept track of the cluster state so that when an HA failover occurred, the virtual machines could restart on an available host in the cluster. Without the primary nodes, there was no visibility into the cluster state. So, if all (5) primary nodes failed, HA could not function.
This was not usually an issue in rackmount infrastructures. However, it posed some challenges in a blade infrastructure where a chassis failure can cause multiple blades to fail. Blade environments should typically have at least two chassis for failover reasons. If there was only a single chassis providing resources for an HA cluster, that single chassis failure could cause an entire cluster outage. You’ll seen in the diagram below that just because multiple chassis are used does not mean that the entire HA cluster is protected.
<a …
(Read more…)

