The juxtaposition of deduplication and replication in disk-based backup appliances is a powerful combination that companies can use to protect backed up data across data centers as well as data backed up at remote and branch offices (ROBOs). Yet where deduplication ends and replication starts can get a little confusing in grid storage architectures such as is supported by the NEC HYDRAstor that features global deduplication capabilities.
To understand how this works, let’s first take a look at this from the perspective of a company that has data centers and ROBOs in different geographic locations. In this scenario, the company may want the flexibility to leverage its primary data center to recover the data backed up at any of its secondary data centers or ROBOs.
To deliver on this ideal, the company would first need to install a HYDRAstor grid at each of its data centers or ROBOs that acts independently of the HYDRAstors at the other locations. Each local HYDRAstor would then deduplicate all data at that site to address that site’s need to shorten backup windows and provide fast recoveries of data at that site.
HYDRAstor’s optional RepliGrid feature then enables the company to protect data from any of its secondary data centers and ROBOs at the primary site. Using this feature, each HYDRAstor at a remote site would replicate its deduplicated data asynchronously back to the HYDRAstor at the main data center on a regularly scheduled basis. The data replication process occurs as follows:
- The HYDRAstor at the remote location provides a list of hash keys for its changed or new data to the HYDRAstor at the primary site.
- The HYDRAstor at the primary site receives the list of hash keys and removes keys it already has from the list
- The reduced list of hash keys detailing what deduplicated data is needed at the primary site is transmitted back to the HYDRAstor at the remote site
- The HYDRAstor at the remote site sends the deduplicated data associated with the hash keys in the reduced list back to the HYDRAstor at the main site
The distinct business benefits that the HYDRAstor RepliGrid feature offers are two-fold:
- Reduces Storage Costs. The amount of deduplicated data stored on the HYDRAstor at the main data center is minimized. By performing global deduplication across all local data and remote data, only net new unique deduplicated data from each remote site is stored on the main HYDRAstor.
- Reduces Network Bandwidth Costs. Since the main HYDRAstor aggregates data across all of the sites, there is a good possibility that the deduplicated data already exists at the main site. By only transmitting net new unique deduplicated data, it minimizes the amount of data transmitted and hence the size of the network pipes needed to transmit the data.
However, offering deduplication and replication is only part of the enterprise data protection picture. As companies look to deduplicate data across their entire enterprise and replicate between multiple sites, spikes in performance and capacity may necessitate the companies to have the flexibility to cost-effectively and easily scale these components of the HYDRAstor architecture. I’ll examine how the HYDRAstor accomplishes those tasks in a forthcoming blog entry.