At recent storage conferences (Storage Decisions, Storage Networking World, etc.) replication has emerged as a hot topic of discussion among end-users. In talking with these different users and listening in on a number of end-user panel discussions, there are a number of factors that they attribute to their increased interest in using replication as part of their company’s overall disk-based data protection strategy.
Category: NEC Corporation of America
Paulk revealed that he is now in full production with the production code loaded on the NEC HYDRAstor. However he is still using the same hardware configuration (two Accelerator Nodes and four Storage Nodes) that he started out using due to the high deduplication ratio that he is achieving with the HYDRAstor. Last fall he was achieving a 17:1 deduplication ratio and hoped to eventually achieve a 35:1 ratio. Six months later, his deduplication ratio is now approximately 39:1 which has mitigated his need to buy additional capacity and has driven his cost/GB down to approximately 70¢/GB. “It’s like getting 390 TB for the price of 10 TBs,” says Paulk.
NEC’s Vice President of Advanced Storage Products, Karen Dutch, recently brought out some salient points about storage management in her Spring 2008 SNW presentation, “Defining Storage Solutions in the Data Center 2.0”. Specifically, she described the features that new storage architectures should deliver in order to keep storage management manageable as storage growth in organizations continues. Of course, the not-so-subtle message is that NEC’s HYDRAstor delivers on these new features.
A well-known study released by IDC in 2007 forecast that by 2010 the amount of information that will be copied and created in the global digital universe will climb to nearly 1 zettabyte (that’s 1 million petabytes). That number was based on the assumption that there was approximately 160 exabytes of information in existence in 2006 and that global data growth will continue to grow at a year over year rate of 57%. Assuming that forecast holds true, this puts the total global store of information at or over 400 exabytes by the end of this year.
In part one of this two-part series, NEC’s Director of Business Development, Dr. Christian Toelg, answered some specific technical questions about how Accelerator Nodes and Storage Nodes differ from one another. This second part takes a look at what specific advantages NEC’s HYDRAstore grid storage architecture has over siloed, two controller storage system architectures when performing deduplication.
HYDRAstor uses a two-step inline process to deduplicate data. Two or more Accelerator Nodes may see the same file at the same time. However, Accelerator Nodes only have a part of the information required to do deduplication and do not maintain the entire global deduplication index. So the Accelerator Nodes chunk up each file into small chunks, eliminate as many duplicates as possible and then send the remaining chunks to the Storage Nodes.
The primary reason that many deduplicating appliances create data silos is that they are based on the traditional dual-controller storage system architecture. Dual-controller storage systems typically use two clustered servers that sit in front of a fixed pool of storage. The NEC HYDRAstor functions as one logical storage system regardless of how much performance or capacity a company adds so it can globally deduplicate all company archive or backup data stored on it
Companies then need to bring in more deduplicating backup appliances in an attempt to keep up with data growth. Yet when that occurs, the new appliance is unaware of the first appliance and unable to take advantage of any of the indexed deduplicated data stores that it has created. As a result, the second appliance must start from scratch and build its own deduplicated data store thus creating the new data silo. These types of issues create a demand for a new type of storage architecture with global deduplication capabilities that new products based on grid storage architectures can address.
Deduplicating appliances have gained mindshare with users because it makes disk as cheap, or cheaper, than tape by delivering data reduction ratios of 15:1 or more while expediting backups which solves their short term backup problems. However companies also need to consider, when selecting a deduplication product, how well it will best serve them in the long term. The capability to globally deduplicate data is very powerful, but most deduplicating storage appliances are limited in scope to just that one appliance.
Data migrations are a painful part of storage management in most enterprise shops today. Driven by storage technology refreshes, storage upgrades, or optimizing data placement on storage systems to improve application performance, data migrations are an ongoing and laborious part of enterprise data management. Introducing disk into the backup process can once again re-introduce the pain point of data migrations. While the initial benefits that companies derive from using disk in any of its different formats in the backup process are usually substantial, the effort associated with managing and migrating backup data from disk to tape over time can become problematic.