As small and midsize businesses (SMBs) virtualize their servers at an increasing pace, many fail to consider the impact this change has on how they do backups – or that it impacts their backups at all. However since many IT administrators who are responsible for backups in these environments would freely admit to not being backup gurus, here is some insight into how server virtualization changes backup and what SMBs need to know about backup as they implement virtualization in their environment.
In January 2012, IDC found that 25-30% of all servers globally are now virtualized and the number is expected to be nearly 50% by 2012. While possibly shocking to some, these numbers are reasonable because the advantages to virtualization are real.
However, as SMBs virtualize, backup and restore of virtual machines (VMs) is emerging as a major pain point that often hinders their deployment. While virtualization does not “break” the backup process, virtualization may open up a number of gaps in data protection and it can cause backup agents to run so inefficiently that the backup process itself breaks, causing IT managers to re-think their backup strategy and solution.
Five factors that particularly impact the traditional backup process in a negative fashion in a virtualized environment are:
1. Fewer resources are available to virtual hosts. When one implements virtualization they are, by definition, sharing resources on the physical host. However backup software agents that are running on the same physical host are also very resource intensive, especially when following the traditional agent-based approach to backup.
An agent requires significant CPU resources to calculate what files or blocks need to be backed up and to compress data for transmission. Agent-based backups are also disk-intensive because the agent does a full scan of all files to be backed up.
Agent-based backups also require network bandwidth to move the backup to the central backup repository. However backup traffic now competes with the virtualized applications for the limited available network bandwidth, potentially slowing both backup speeds and production applications.
The big takeaway here is that physical resources on hosts that were previously available for use during backups are no longer available. By continuing with agent-based backup in a virtualized environment, each VM’s backup is often fighting with backups on other VMs for the same available but now limited physical resources.
2. VMs use virtual disks. To avoid resource contention, some administrators consider backing up directly to the storage volumes where the VM data resides. While this is usually safe for the VM’s metadata (configuration files for example,) it is usually not a good idea for the VM’s virtual disks (VMDKs.)
Current virtualization technologies commonly create their VMDKs within a larger physical disk pool. These virtual disks are usually then implemented as one or more large files, usually gigabytes in size.
However this presents a problem to standard backup tools. Without knowledge of the internal structure of the VMDK the backup software is forced to store the changes to the file as a whole.
This is extremely inefficient and in some cases may require full backups of each VMDK. The other risk is that if the VM is not powered down or suspended during the backup, the VMDK may not be in a consistent state so the metadata may not match the state of the VMDK.
3. Difficult to detect new VMs. One of the big advantages of virtualization is how easy it is to create new VMs. However without integration between the backup software and the virtual operating system it is difficult to detect when a VM is created. Backup administrators then either need to rely on server administrators to install a backup agent on the VM or have some process in place that they can manually use to detect the creation a new VM, otherwise it will go unprotected.
4. VMs can move around (migration). Backup software has historically made the reasonable assumption that applications would not move from one physical server to another. In a virtualized environment however, one of the key benefits is that VMs can move from one physical host to another. This process is called “migration”. Unfortunately, migration can lead to VMs not being assigned to any backup agent and thus, not being protected.
5. Granular restores are problematic. The ability to quickly and easily backup an entire VM virtual disk (VMDK) is one of the big attractions of doing VM backups. However it is also one of its largest drawbacks. Most backup software has no visibility into the VMDK because it just looks like a big file, so the best and only restore that most backup software can do is a restore of the entire virtual machine to a specific point in time.
The downfall of this approach is that most restores performed are not restores of entire servers or VMDK files. They are restores of individual file, records or emails contained within the virtual disk. So to do a restore of a file or email, an administrator must first use the backup software to restore the entire VMDK file and then go into the virtual machine itself to restore the needed file or email.
It is for these reasons that traditional agent-based backup is not the right approach for protecting a virtualized environment. So while the first inclination of many administrators may be to use this approach as it is what they know, it creates backup problems that are, for all practical purposes, unsolvable.
Overcoming the new backup challenges caused by virtualization requires a new approach to backup. This approach requires tight integration with the hypervisor and management layer of the virtualization environment so it can take advantage of the new backup features available in these hypervisors.
But, taking a new approach doesn’t eliminate the need for backing up physical servers and applications like exchange and sql-svr. And, most IT environments will have a mix of virtual and physical servers for the foreseeable future. So, just solving the 5 problems identified above is not enough. The new backup strategy must also continue to protect all of the existing physical servers and applications that may not even have virtualization software running on them.
The good news is that there are products on the market today that address these issues and give administrators even better options to backup and recover data than they are accustomed to today.
In the second blog entry in this series we investigate the major features to look for in backup software for virtualized environments.
In Part III in this series we look at how to achieve success in doing backup of VMs without first needing to become a “real” backup expert in order to do so.