CDP and deduplication are now on the forefront of the minds of more enterprise managers as they contemplate how to best introduce disk-based data protection into their backup environment. Contributing to the difficulty in selecting one of these technologies is that they address different data protection needs: CDP provides shorter application recovery time and point objectives while deduplication reduces disk data storage requirements. Further, companies need to account for how individual blocks of data are protected to prevent large scale data loss should a single block of data be lost.
To better understand how Asigra’s Televaulting delivers all three features – autonomic healing, CDP, and deduplication – in an agentless fashion, I spoke with Marc Staimer. Marc is the President of Dragon Slayer Consulting based in Beaverton, OR, and is known as one of the leading storage analysts in the network storage and storage management industries. Most of his consulting work is in the areas of strategic planning as well as product and market development. In this second of a two-part interview with Marc Staimer, Marc discusses how Asigra’s Televaulting implements these features.
Jerome: CDP is typically implemented using an agent or in the data path using some type of appliance or FC switch. In the case of Asigra’s Televaulting, how does CDP work?
Marc: Asigra’s Televaulting agentless architecture adopts the LAN’s existing network security and uses administrative privileges to access any server on the network and collect changes to the data. CDP technology generally works in one of two fashions: creating copies of write I/O as writes occur or constantly scanning the server looking for changes to data.
In Asigra’s case, it uses a hybrid method to do CDP. File and directory change notifications are captured real-time and changed files are marked as requiring backup. The application will then pick up any files marked for backup as soon as possible, isolating any changed data blocks and performing the backup for changed blocks only. By capturing only change notifications rather than the actual I/O content, Asigra can adjust the speed of the CDP depending on the link to the backup vault and provide maximum data protection given a specific connection speed: if the link is fast, data protection would be virtually instant; if the link is slow, there may be delays of a few seconds or minutes until the changed content it backed up.
Jerome: Is CDP on by default in Asigra Televaulting?
Marc: No, but CDP is user-selectable. More importantly, CDP is part of the Televaulting software and users do not need to pay extra to obtain this feature.
Jerome: But won’t using CDP consume large amounts of extra disk space on the DS-Client? (Note: The DS-Client hosts the Asigra Televaulting software and can run on an existing or dedicated server. The DS-System is the central repository that collects data from the individual DS-Clients.)
Marc: While the DS-client will collect more data with CDP turned on, it won’t amount to significant amount of extra data since the data is deduplicated as it is collected.
Jerome: Explain how CDP and deduplication work in conjunction with one another.
Marc: The changes to the data are accumulated and deduplicated at the DS-Client at the block level. Deduplication is done inline as the data arrives using variable length blocks. The deduplicated data on the DS-Clients is sent to the central DS-System which does a global deduplication of the data across all of the DS-Clients sending data to it. In essence, Asigra Televaulting uses a unique combination of inline and post-processing to deduplicate data.
Jerome: One of the criticisms of deduplicating data is the amount of time it takes to recover data. How does Asigra mitigate this concern?
Marc: Restoring processed data in Asigra’s case adds very little processing overhead: the data blocks are stored in sequential order so they can be retrieved very fast, and decryption and decompression are very fast in terms of CPU processing. The real issue here is that the data does not sit in the backup as a 1:1 copy on disk, so it requires a restore. Even through the reconstruction and decryption are extremely fast, writing to disk may not be.
It also depends on the amount of data you are trying to recover. Is it a file, a volume or an entire site? If it is a file or a volume, recovering deduplicated data does not take that much more time than recovering data stored in its native state. If recovering an entire site due to a disaster, companies usually have bigger problems than the time it takes to reconstitute deduplicated data – such as where they will recover their business and how to restore data as WAN speeds could be a bottleneck. In these circumstances, enterprises and Managed Service Providers (MSPs) will reconstitute the data first and put it on some type of portable media (a removable disk cartridge or even entire disk subsystem) that they can drive or overnight to the DR site rather than trying to send the data through a WAN pipe. Another frequently used option is to restore data at LAN speed in the data center and then ship the preconfigured hardware with the data already restored.
Jerome: Explain Televaulting’s autonomic healing feature and why it is important.
Marc: Since Televaulting deduplicates data, one block of data is potentially used by multiple files. If that block of data is corrupted or damaged, it could potentially impact the recoverability of all the files that use that block of data. Televaulting verifies the integrity of the data in the backup process by collecting a certain amount of extra data to confirm the data is accurate. If the integrity check fails then Televaulting requests that data from the server again before permanently storing it.
If you would like to contact Marc directly you may do so at marcstaimer@comcast.net.