In the last year or so a number of articles and blogs have appeared on the topic of inline and post-processing deduplication in an attempt to answer the question, “What is the best approach for deduplicating data during disk-based backup?” Unfortunately what these pieces fail to quantify is, “What objectives are enterprise organizations looking to accomplish with disk-based backup and recovery?” The problem this creates is that without first establishing these objectives, it makes it very difficult to arrive at any sort of meaningful conclusion about how to best proceed with deduplication.
In my experience, there are three main objectives that enterprise organizations are trying to accomplish in regards to solving their backup and recovery problems:
- Stop the backup pain. They want their IT staff to focus on higher level IT issues rather than focusing on the traditional backup-centric questions such as what backups jobs failed, why they failed and how to fix them.
- Use disk without breaking the bank. It is no longer any secret that using disk as a primary target and source for backup and recovery is now almost universally viewed as the right way to stop the pain. However the amount of disk that organizations need can be a significant obstacle to the adoption of disk as a backup target.
- Consolidate backup data stores. Consolidating their backup data stores eliminates the costs and overhead associated with managing backup processes at each individual site, results in more effective utilization of the disk storage, better management of the data and new options for disaster recovery.
It is for these last two reasons that the topic of deduplication quickly becomes part of the backup conversation, especially in enterprise shops since they have so much data to backup and deduplication’s benefits become clearly evident. Using deduplication makes the adoption of disk financially viable, facilitates the consolidation of their backup data stores and creates the new options for disaster recovery that enterprises want.
However what can sometimes occur in an organization’s excitement of being able to deliver on these last two objectives, the organization can forget about or downplay its primary goal: Ending its backup pain.
This is why selecting the appropriate deduplication approach for an organization’s environment is critical. If the end user only focuses on the last two points and does not put equal weight or ignores the first one, organizations are right back to where they started: daily morning meetings troubleshooting problems from the previous night’s backups.
It is for these reasons that enterprise organizations should give preference to post-process solutions. These technologies do not limit ingest performance with CPU intensive deduplication processes. By first storing backing up data directly on disk and then deduplicating it, a post-process algorithm avoids the performance impact of inline approaches.
This is not to imply that the other two business objectives should be ignored or discounted. They should not. Enterprises that need to get backup data offsite as quickly as possible after the backup is complete will find solutions such as what the SEPATON S2100-ES2 offers meet these multiple requirements well.
Unlike some other post-process solutions, the S2100-ES2 can start deduplicating and replicating data as soon as a specific backup job completes. Using this more advanced approach, data is offsite and ready for recovery much more quickly than if one had to wait until all of the backup jobs were complete before these two tasks could begin.
The S2100-ES2 provides a unique grid architecture that enables scalability through the addition of computing nodes and disk shelves. This scale-out design enables SEPATON to address any concerns about an end user’s ability to ingest, deduplicate and replicate data within required time constraints.
This architecture gives organizations the flexibility to introduce more nodes for additional performance to expedite deduplication, replication or even reconstruct deduplicated data without impacting either production backup jobs or the time needed to recover the data.
It is not surprising that so many articles and blogs are trying to answer the question of whether inline or post-process deduplication is a better choice. However before organizations can arrive at any meaningful answer to this question, they should first remember what business objectives they initially set out to accomplish and then verify that whatever solution they select accomplishes these objectives.
The deduplication decision becomes even more difficult for enterprise shops. Not only do distinct differences between inline and post-processing deduplication solutions exist, but within each general deduplication approach, individual solutions handle processes such as deduplication and replication differently.
It is for these reasons that enterprise organizations should give preference to post-process deduplication architectures and solutions from providers such as SEPATON that are designed to solve the specific challenges they face and are architected to deal with the unique challenges of enterprise backup and recovery.