NEC HYDRAstor Uses a Two Step Inline Deduplication Process Part 1

The grid architecture upon which NEC‘s HYDRAstor is designed is unique. By using a grid storage architecture, HYDRAstor is architected to avoid some of the scaling issues associated with performance and capacity that other deduplication appliances intended for the enterprise may encounter. To do this, HYDRAstor uses two types of servers, or nodes, Accelerator Nodes and Storage Nodes, that are dedicated to managing these specific tasks. To better understand HYDRAstor’s under-the-covers configuration, I recently had a conversation with NEC’s Director of Business Development, Dr. Christian Toelg, to discuss this topic.

Jerome: How do the HYDRAstor Accelerator and Storage Nodes differ from one another in their hardware?

Dr. Toelg: For the most part, they use off-the-shelf hardware, though there is one big difference between them. Accelerator Nodes do not need to store large amounts of data like the Storage Nodes do. Since the Accelerator Nodes only need enough storage for the operating system, they just use standard disk drives that are mirrored. Conversely, we want to put as much storage as possible into the Storage Nodes. Currently these nodes use 500 GB SATA disk drives, though 750 GB and 1 TB SATA disk drives for the Storage Nodes will be available in the near future.

Jerome: Since their hardware is similar, how do the nodes differ in the software that operates on them?

Dr. Toelg: The Accelerator Nodes use file systems to present NAS interfaces to the connecting servers; backup clients see the NAS interface presented by the Accelerator Nodes and the Accelerator Nodes see the NAS interface presented by the Storage Nodes. However, what the software on these nodes does under the covers is very different. The Accelerator Nodes pre-process incoming data by chunking the data and then doing de-duplication of the data. The primary function of the Storage Nodes is to protect the data and then assign the data to the nodes where it is most efficiently and effectively protected.

Jerome: So both the Accelerator and the Storage Nodes deduplicate data?

Dr. Toelg: That is correct. HYDRAstor uses a two-step inline process to deduplicate data. Two or more Accelerator Nodes may see the same file at the same time. However, Accelerator Nodes only have a part of the information required to do deduplication and do not maintain the entire global deduplication index. So the Accelerator Nodes chunk up each file into small chunks, eliminate as many duplicates as possible and then send the remaining chunks to the Storage Nodes. The Storage Nodes receive these chunks of data and then make the final determination as on which chunks are unique and should be stored to minimize storage requirements. Data is protected by breaking the unique chunks of data up into fragments and distributing the fragments across the Storage Nodes.

Jerome: How do the Storage Nodes protect the data without impacting performance?

Dr. Toelg: HYDRAstor’s two-tier grid architecture has been designed so that many read and write processes can run in parallel by splitting tasks between Accelerator and Storage Nodes and distributing the workload across many nodes. This makes the HYDRAstor unique when compared to monolithic appliances as it allows you to scale performance and capacity in parallel. Furthermore, the HYDRAstor system is constantly monitored for the discovery of new nodes, component failures or the removal of specific nodes from the cluster of Storage Nodes. Background tasks ensure the system is balanced with respect to storage capacity and performance at all times without manual interaction. By distributing tasks so read and writer performance is maximized and maintenance of stored data such as migration of data to optimize utilization, data deletion or recovery of lost data is ensured.  

Part 2 of this analysis of how NEC HYDRAstor’s Accelerator and Storage Nodes are configured and deduplicate data will appear in the next few weeks. The next part will examine what benefits global deduplication provides and under what circumstances users might expect to reach those ratios.

Note: This blog entry was updated on 4/14/08 at 11:32 am CST to reflect some new information on how the NEC’s HYDRAstor deduplication process works.

Click Here to Signup for the DCIG Newsletter!


DCIG Newsletter Signup

Thank you for your interest in DCIG research and analysis.

Please sign up for the free DCIG Newsletter to have new analysis delivered to your inbox each week.