The balancing act that every small and midsize enterprise (SME) plays when making new technology purchases is finding the right balance between cost and technology. In the area of backup, this particularly holds true as backup to disk coupled with deduplication has now made disk backup a cost-effective replacement for tape backup while eliminating the headaches associated with using tape as a primary backup target. But with deduplication available in so many different forms, I wanted to offer SMEs a couple tips to help them get the most bang for their deduplication and replication bucks.
- Target deduplication
- Source deduplication
- Hybrid deduplication
- Standalone target deduplication
- Integrated target deduplication
- Standalone source deduplication
- Integrated source deduplication
These descriptions offer summary overviews without getting into how deduplication actually works on each of these seven implementations. Deduplication can occur at either the block, byte or file level, be done inline or post-process, and use a hashing or delta differential algorithm to deduplicate the data. Some solutions even give users the option to choose which of these deduplication options they want to use and under what circumstances. Plus, data may again be deduplicated when it is replicated, creating a dizzying array of ways in which deduplication could be implemented.
Large enterprises are, for the most part, driving all of these different choices in deduplication due to the amount of data that they have to protect. How deduplication is implemented impacts how well backup data is optimized, how quickly backups and recoveries occur and ultimately how cost-effective the solution is. It is for this reason that many large enterprises are in the process of implementing deduplication at multiple levels (at the source, in the backup client and at the target) within their infrastructure.
However, it is unlikely that most SMEs need their deduplication implementation to be this sophisticated. They are simply trying to solve their current backup problems with an affordable disk-based deduplication solution. So the question they need to answer is, “How can SMEs be assured the solution they select delivers the deduplication features that they need in order to meet their requirements?“
The best way for SMEs to handle this question is to first quantify what they are really trying to accomplish which usually amounts to the following:
- Reduce their backup and recovery windows
- Ensure their backups successfully complete at or near 100% of the time
- Implement an offsite replication and recovery plan
- Minimize their upfront and ongoing costs associated with a deduplication solution
Notice that achieving an “optimal” data deduplication ratio or buying a “large enterprise” deduplication algorithm is rarely if ever mentioned as a prerequisite. Rather, SMEs are centered on getting the right technology that matches their day-to-day demands. So here are a couple of tips to meeting these competing demands.
- The “large enterprise” deduplication approach may not be needed for backup data deduplication in SMEs. The deduplication technique used by large enterprises can arguably achieve deduplication ratios of up to 15x or greater. However, most disk-based deduplication solutions come with support for at least 15 hard disk drives, each with 2 TBs of raw storage capacity, so even a “small” disk-based backup solution has 30 TBs of raw storage capacity. So when you multiply 30 TBs by a deduplication factor of 15x you get a result of 450 TBs or more of logical capacity.
That’s impressive! But how many SMEs really need 450 TBs or more of logical capacity? If you are only backing up a few hundred GBs every night and 1-2 TBs on the weekend, that is likely overkill. Granted, 30 TBs of raw capacity may not be quite enough, but there are solutions like the Revinetix Sentio that use alternative deduplication methods that enable users to easily achieve deduplication factors of 4x and reach 100 TBs or more of logical storage capacity, which is more than enough for their needs. Further, disk drives are only getting larger so a deduplication ratio of 4x could easily deliver 180 or 240 TBs of logical capacity in just a couple of years.
Now the counter argument to that is why should an SME settle for an alternative deduplication method when this other approach achieves higher deduplication ratios? This is when the cost of the solution comes into play. You are likely going to pay for 15 drives regardless of which solution you buy. So if you only need say 60 TBs of logical capacity and the Revinetix Sentio delivers that while costing less, why pay for “large enterprise” deduplication technology when you don’t need it?
- The “large enterprise” approach is likely needed for replication. Most SMEs are looking to minimize or even eliminate tape in their disaster recovery environment but that is only possible if they replicate data offsite. This is why Revinetix replicates data at the byte level. This approach minimizes the amount of data sent over WAN connections so SMEs can ideally keep their existing WAN connections.
Granted, byte-level replication is more resource intensive so the possibility exists in some solutions that data replication may and probably will spill over into production hours. However, replicating data offsite tends not to be as time sensitive as backup so if this does occur, no one will notice or care. Revinetix solves this problem by allowing SMEs to schedule their replication times outside normal business hours.
SMEs primarily just want their backup pains to end but they also want the appropriate deduplication and replication technologies that offer them the proper balance between cost and technology. The Revinetix Sentio does exactly that. It offers the right level of deduplication at the right spots in its solution without forcing SMEs to become replication and deduplication experts or requiring them to spend a pile of cash in order to implement a disk backup solution.