One of the more critical pieces of information that organizations need as they put together a disaster recovery plan is how much data they have in their environment and how quickly it is changing. The reason this information is so important is that without it, organizations often have no way to effectively size how much or what type of capacity they need to protect and recover their production data. In fact, I was astonished at how little information this was available about this topic or the fact that there were so few good articles on the subject.
Ironically, some of the best information I found on the web on this topic was squirreled away in comments on user forums. For instance, on one forum on the Windows IT Pro web site, one administrator was looking for information on how to appropriately size the network bandwidth between his primary and secondary sites for replication. The best (and only) response that anyone provided was for the administrator to take a look at how much data his organization is incrementally backing up on a daily basis to get some sense of how much data is changing daily in his environment.
That is not a bad idea but it only gives an organization a snapshot of what data has changed in roughly a 24 hour period. As a result, the organization has no sense of how quickly or during what times the data has changed during that period. So if a organization has 500 GB of production data and 50 GB changes in 24 hours, the data probably did not all change at one moment in time but neither should organizations assume the change rate was evenly distributed over that period of time. Using this approach, it’s impossible to identify the peak transfer times and what size network pipe is needed.
The other problem with only looking at incremental backups is that it gives organizations no sense of how much new data was created. So again, using the above example, just because 50 GB of data has changed does not mean that 10% of the 500 GB of existing production data has changed. Maybe only 15 GB of the existing 500 GB has changed while 35 GB is net new data. This changes the capacity requirements for recovery at the remote site from 500 GB to 535 GB.
Again, in this day and age, maybe 35 GB of new data in one night is not enough to preclude a successful recovery from occurring. However, adding that much new data on a nightly basis over weeks and months will eventually have a negative impact on an organization’s ability to replicate, restore and recover their data at a DR site since they may inadvertently run out of capacity at their DR site with minimal or no warning.
The only effective way to capture data change rates is to use a tool that captures this type of information in real-time. While I am unaware of any software that is specifically dedicated to this task, there are replications solutions that offer this functionality as part of what they do. For instance, InMage Systems includes a feature called Profiler that is part of its more robust Scout replication product.
Scout Profiler essentially does everything that Scout does. It predicts how much data changes daily. It examines how well the data will compress. It determines how much bandwidth your organization will need to meet or exceed applications’ service levels. This feature allows organizations to obtain enough bandwidth to meet their applications’ requirements anytime throughout the day without spending too much on unneeded network bandwidth. Finally, by tracking how much data is changing and what new data is being created, organizations can also forecast how much and what type of storage capacity they need at their DR site to keep up with the primary site. About the only thing Profiler does not do is that it stops short of actually replicating the data to the remote site.
We live in the Information Age but it sometimes seems that the most basic information that organizations need in order to make some of their most strategic and costly decisions is unavailable and nowhere is that more true than when it comes to organizations understanding their DR requirements. The Profiler feature of InMage Systems’ Scout gives organizations the information they need to document and understand how data is changing and growing in their current production environment. In so doing, organizations can build and maintain a DR site that meets their needs now and into the future.