In previous blog entries I have made reference to silos of deduplicated backup data stores but have not gone into great deal of detail as to what specific problems data silos create. So in this entry I take a closer look at:
- Why data silos are created
- The problems that data silos can create
The primary reason that many deduplicating appliances create data silos is that they are based on the traditional dual-controller storage system architecture. Dual-controller storage systems typically use two clustered servers that sit in front of a fixed pool of storage. These two servers provide high availability, improved performance and access to the data stored on backend pool of storage by either server.
Traditional storage systems serve consumers well when storage growth is limited. But this model starts to break down when used in conjunction with deduplicated backup data stores. Anecdotal evidence suggests that data growth in most companies continues to grow at rates of 50% or more every year. Using deduplication in the backup process does significantly reduce the amount of data that companies need to store. However, over time, even deduplicating backup appliances eventually need more performance and storage capacity and the underlying architecture of traditional dual-controller storage systems does not permit them to scale.
Companies then need to bring in more deduplicating backup appliances with isolated backend storage in an attempt to keep up with data growth. Yet when that occurs, the new appliance is unaware of the first appliance and is unable to take advantage of any of the indexed deduplicated data stores that it has created. As a result, the second appliance must start from scratch and build its own deduplicated data store which creates the new data silo.
Data silos create problems on a number of fronts. More appliances create more points of management. Backups become more difficult to manage since administrators need to determine what backup jobs they need to send where. Send too many backup jobs to the new appliance and it may encounter the same performance or capacity problems as the existing appliance. Send too few jobs and the new appliance will not deliver the full benefits of deduplication. In addition, separating the backup jobs to isolated deduplicating appliances with different data stores inhibits the ability to deduplicate data globally across all backup jobs.
These types of issues create a demand for a new type of storage architecture with global deduplication capabilities that new products based on grid storage architectures can address. NEC HYDRAstor is one such product that uses a grid architecture to allow it to scale performance and capacity independently. Companies can start with a configuration that meets their immediate data storage needs and budget restrictions. However companies can scale it without needing to create additional points of management or new data silos.
Because the NEC HYDRAstor functions as one logical storage system regardless of how much performance or capacity a company adds, it can globally deduplicate all company archive or backup data stored on it. This reduces and optimizes corporate data stores while eliminating the headaches associated with managing multiple appliances and backup jobs.
The adoption of disk-based backup and deduplication is accelerating in more companies. As it does, new storage system architectures are needed to help companies solve the type of problems that deduplication and today’s data growth creates without creating data silos. Products based on grid storage architectures such as NEC HYDRAstor can help eliminate data silos while giving companies newfound flexibility to manage their growing volumes of data while minimizing the amount of data they need to store.