The New Deduplication Debate: Where to Draw the Line on Deduplication

Not that many years ago the debate around how to best deduplicate data centered on inline versus post processing deduplication as data was archived or backed up. While that debate still simmers, a new one is brewing that was spurred in part by the recent announcement that Dell plans to acquire Ocarina Networks. This one touches on where organizations should draw the line on data deduplication.

In the last couple of years the business case for deduplicating archive and backup data, which is characterized by high levels of redundancy and infrequent access, has clearly been made. As organizations look to use disk as their primary target and/or medium for archive and backup, data deduplication drives down the cost of disk to the point where it is as economical as tape while providing the benefits of disk (successful backups and recoveries and less time to complete them).

But as one looks to move deduplication up the stack into primary storage such as what Dell is apparently looking to do by adding Ocarina Networks’ technology to its line of EqualLogic storage system, where to draw the line on what data to deduplicate can start to get a little hazy.  While introducing deduplication onto primary storage systems can certainly reduce data stores and ultimately lower storage costs, there is no guarantee that deduplication is appropriate for all data residing on primary storage.

DCIG analyst James Koopmann argued that one form of data that organizations would be wise not to use the storage system deduplication feature is data found in databases. Databases that are properly designed deduplicate the data stored in them as a matter of course using a technique called normalization. This is done to eliminate the redundancies that otherwise creep into database as well as to facilitate quick retrieval of information.
 
Koopmann even argues that if a storage system vendor promises that it can reduce the storage capacity used by a database through deduplication then the database administrator should be concerned about the quality of the underlying database design. He say, “Reducing the storage requirements for databases through deduplication just puts a Band-Aid on this problem and does not address the real issue.

However proper use cases for introducing deduplication on primary storage do exist and it is one of these that is driving Dell’s interest in Ocarina networks. The two best use cases I have heard to date for using deduplication on primary storage is on file servers and in virtualized server environments.
 
In the first use case, deduplicating data on file servers, it is unclear how Dell may leverage Ocarina’s technology. While Ocarina Networks technology has the ability to do this, Dell does not really play in the enterprise NAS space so whatever benefits that Ocarina Networks provides to Dell is this regards appear to be merely coincidental at this point in time.

The second use case, deduplicating virtualized server images, appears to be the logicrole that Ocarina Networks will eventually fulfill within Dell. Dell’s EqualLogic storage systems are known if for nothing else their ability to deliver iSCSI SANs and iSCSI as a protocol is wildly popular in virtualized server environments right now.
 
So Dell’s decision to acquire Ocarina Networks only makes sense since up to 90% of the data in virtualized server images may be redundant. Deduplicating these virtual server images reduces the amount of virtual server data stored and should serve to improve performance of virtual machines since their images can now be more easily and economically stored on high performance storage from Dell such as its PS6010XVS.

But as deduplication finds its way onto more primary storage systems it does not mean that organizations should redraw the line of how they use deduplication within their environment to exclude archive and backup.
 
CommVault’s Dave West makes the point in a recent blog entry that copies of data used in archive and backup as well as for compliance and replication consume orders of magnitude more storage capacity than what the original copy of data on primary storage consumes. This is a problem that deduplication on primary storage does not solve. Rather he argues that using software to deduplicate  data across all tiers to include disk, tape and even emerging storage clouds such as CommVault does slashes storage costs and is key to optimizing storage space and performance across the entire storage infrastructure.

Deduplication is appearing at almost every level of the storage stack and, as it does, the line that enterprises draw as to what data is deduplicated is clearly moving up the stack to now include such applications as file services and server virtualization.

However organizations should not confuse the growing use of deduplication on primary storage as a replacement for proper storage management practices. The introduction of deduplication on primary storage moves the line but does eliminate the need for it where it is already used. If anything because deduplication will enable so much more data to be efficiently stored on primary storage, the need for a comprehensive software-based deduplication solution such as what CommVault offers may be greater now than ever before.

Click Here to Signup for the DCIG Newsletter!

Categories

DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.