Since the inception of VCS (Veritas Cluster Server), end-users have had access to significant higher levels of reliability and availability on heterogeneous platforms such as AIX, Linux, HP-UX, Solaris and Windows for their critical, tier-1 applications. Now with a decade of clustering critical business applications under its belt, Symantec has the experience and understanding of what customers expect from high availability (HA) software and what they need to make it successful in their shops.
Now enter VMware. Customers can and should plan to use the advanced features offered by VMware such as VMotion, Storage VMotion, and DRS (Distributed Resource Scheduler). However these features lack a couple of critical components that are needed for application high availability: application awareness as well as the inability to account for unplanned downtime (e.g., user errors).
While VMware’s offerings certainly provide a streamlined process for planned downtime, their awareness is limited to the vm layer. This is problematic when one considers that most failovers do not occur at the hardware level but instead occur at the application level and are too often unplanned. In this respect, VCS compliments VMware’s offerings since they work together to increase application availability by accounting for both hardware and software mishaps.
VCS accomplishes this by taking the availability of applications running on VMware virtual machines (VMs) to a new, application-aware level. First off, everything you have grown accustomed to with VCS on physical machines still exists when running in a VMware environment, such as local, metro, and long-haul clustering as well as failover. VCS also works with storage controller replication technologies, such as EMC’s SRDF and HDS’s Universal Replicator, to coordinate movement/ownership of data between different systems and then failover the application to the alternative site. Because it is application aware, it also carries forward its existing extremely deep application integration with applications such as Exchange, Oracle, SAP, SQL Server and others that companies deem necessary when configuring these application servers for HA.
The VCS software puts an application agent on each VM on the physical ESX server that takes away little or none of the performance (CPU, Memory, Network, Storage I/O) the VMs need. In return VCS provides a deeper understanding of how the application uses the VM’s and physical machine’s resources and is integrated with the ESX service console. If VCS detects that an application is beginning to encounter problems, it can failover to an alternative physical or virtual machine. VCS also removes the requirement for stand-by or idle machine as you can run VCS in an active-active mode of operation. In this state, you can take advantage of the hardware investment at the remote location.
One very exciting feature available in VCS for VMware is the ability to cluster and failover VMware’s VirtualCenter server. VirtualCenter acts as the fulcrum of your VMware management environment but it comes with some dependencies in order for it operate. Specifically, the VirtualCenter platform is dependent on Windows and needs a functional VM Guest. In these environments VirtualCenter failover and clustering is an imperative since VirtualCenter runs on a physical machine. Only VCS makes it a reality.
Now let’s look at a failure scenario. Assume you have three physical VMware servers; two at a primary site and one at a remote location for DR and Business Continuity with three VMs running on Server 1 at the primary site with VCS configured on all of them. Assume only one application running in VM #2 on Server 1 crashes due to a faulty patch, virus, or worm. At this point, VCS can take over and ensure that the VM guest on the second physical VMware ESX server is in recovered state before bringing it online. It also can ensure the application has not suffered any loss of data either. Now the VM can be seamlessly failed over to the other physical ESX server at the primary location.
Take this a step further and you lose both ESX Servers at the primary site, the shared storage and/or both. Since the data was replicated via EMC SRDF (or other major storage array vendor), VCS will now automatically take over and fail some or all of the VMs and their hosted applications and storage to the remote location–with a single click.
VCS also has a very innovative feature called Fire Drill which lets an end-user test the failover scenarios we’ve looked at above. This feature ensures that all the resources are operating within their functional means (Storage, Network, Replication, application(s) and VM dependencies). Fire-drill allows you to test every aspect of the cluster failover, with the exception of DNS changes, all in real-time without impacting the availability of your production applications to the business. This can be very useful for validating you disaster recovery and business continuity plans.
Adding VCS into your VMware environment allows companies to move the reliability and availability of their VMware infrastructure to a more complete local and global level. These techniques for clustering VMware will give you and your organization the confidence it needs to deploy critical business applications in virtualized environment without sacrificing the application availability and integration to which they have become accustomed.