Business continuity planning (BCP) is already a mainstay at many midsized enterprises. An AT&T 2012 online survey of IT executives from organizations throughout the United States found that over 80% of them, to include businesses with revenues of no more than $25 million, have BCP in place. These survey respondents who have the primary responsibility for BCP in their organizations also shared the following:
- 86% have a BCP plan in place
- 63% fully tested it in the previous twelve months
- 29% have invoked it at some point in their past
The degree to which these organizations have deployed BCP is to be commended and is certainly a vast improvement over what it has been in the past. However BCP is just the beginning of the journey to arrive at a highly available environment, not the destination. The results of this survey still leave serious questions as to:
- How successful was the test?
- How much of their data did they recover?
- How long did the recovery take?
- What number of IT staff was needed to execute upon the recovery?
- How much did it cost?
Even for those that did a BCP test in the last twelve (12), six (6) or even two (2) months, this test does not translate into assurances that their recovery will work today. As any organization knows, IT environments – especially those that are virtualized – change very quickly. New applications and VMs are constantly added. The IT staff that conducted the last test may no longer work at the company. Hardware may have been upgraded or replaced. These changes create uncertainty as to how recoverable their environment truly is.
Compounding the problem, enterprises are questioning BCP’s overall value. Aside from the cost to build or lease a secondary site to host these BCP exercises, the computer gear in these sites is often idle or heavily underutilized as it only used during BCP activities while consuming operating expenses the rest of the time.
The shortcomings of BCP explain why enterprises are coming to the conclusion that uninterrupted application availability with its automated application failover and failback is a better option than BCP. The ready availability of cloud technologies means that almost any size enterprise can realistically implement and maintain this environment – potentially more affordably and easily than BCP. Yet to achieve this higher form of HA for all of their applications requires they first put in place the right set of technologies.
Achieving a “Higher” Form of HA
There are six (6) specific technologies that enterprises need to put in place to achieve this “higher” form of HA for the applications in their environment.
1. Hypervisor clustering. Affordable, scalable HA starts with virtualization as it is needed to facilitate non-disruptive failover and failback of all applications. Using VMware vSphere with its vMotion and Metro Storage Cluster (MSC) capabilities, organizations can transparently failover and failback virtual machines (VMs) and their applications between different physical sites.
2. Array-based synchronous replication. VMware vMotion and MSC moves VMs back and forth between different physical machines at different sites. However the data of these VMs must be moved separately. To ensure the data is at the target site as a VM comes up requires the use of array-based synchronous replication software such as HP 3PAR Remote Copy which keeps the data associated with the VMs in sync between the two sites.
3. ALUA. Asymmetric Logical Unit Access (ALUA) is used by hypervisors such as VMware vSphere to communicate with backend storage arrays. ALUA provides multi-pathing (two or more storage networking paths) to the same LUN on a storage array marking one path “Active” and the other “Passive.” The status of the paths may be changed either manually by the user or programmatically by the array e.g. in the event of failure.
4. “ALUA” aware storage arrays. Should the status of paths be reversed (“Active” paths become “Passive” and vice-versa), the storage array must notify the hypervisor host appropriately so that the hypervisor can re-scan the network for currently active paths.
5. “Active-Active” controllers on storage array. Active-Active controllers such as are found on an HP 3PAR StoreServ makes all LUNs “active” on all controllers of the array. Its mesh active controller architecture negates ALUA’s initial use case (managing “Active” and “Passive” paths to a single storage array) freeing ALUA to be used for other purposes.
6. Quorum Witness. A Quorum Witness is software that resides on either a physical or virtual server and monitors the availability of each site. Should one site go down or offline, it detects the status of the failed site, notifies both VMware and the HP 3PAR StoreServ array of this condition and has both of them failover application operations to the recovery site.