In doing some research this past week for a paper I am working on, I ran across a number of articles from 2013 that discuss how to repair broken backup environments. As anyone close to the IT industry knows, backup has been broken for at least a decade or more. But as these same individuals also likely know, organizations must adopt more than new backup technology to fix it.
To someone who worked in the trenches for many years, the knowledge that “backup is broken” dates back to at least the mid-1990’s. Even then backup was barely functional at best and broken at worst. By the time 2001 rolled around when I took a job at a Fortune 500 data center, the knowledge that backup was broken was so prevalent that even my upper management acknowledged there was an issue and wanted to see it fixed.
While we made some progress during my tenure at that company, the company never really did get it fixed for a number of reasons. Granted, some of the issues were technical in nature as the technology either did not exist or was just starting to become available to fix these issues. But just as many of the challenges were rooted in how the organization itself was structured that made it difficult if not impossible to really get at the root causes of why backups were failing and correct them.
It appears that part of the reason a number of analysts and vendors promoted this “Backup is broken!” theme in 2013 was to help draw attention to some of the technology fixes that are now readily available. In some of the presentations that I found online, they promoted the use of disk-based backup, continuous data protection (CDP), snapshots and backup software with a “virtualization-first” focus as a means to fix these problems.
These are absolutely the right new technologies to start to fix backup. But by themselves they are likely not going to be enough. There are many more intangibles that organizations must factor into the equation if they are really serious about fixing backup in their organization. Here are some of the other intangible factors that companies should also consider when they set out how to fix it
- Dedicate sufficient IT staff to properly manage backup. IT staff feel overwhelmed today by the number of technologies for which they are responsible for overseeing and managing. As one individual recently told me, “I end up getting calls for anything that plugs into the wall.” Any organization that has IT staff who expresses that thought should revisit just how well their backups are being performed. Odds are that it may need more staff, better technology or some combination of both. Even outsourcing backup may not be a bad option.
- Centralize control of the data center/IT environment. It never ceases to amaze me how many organizations are broken down into little fiefdoms. Even in today’s age when most companies acknowledge that consolidated data centers are best (and may even exist,) they have not taken the next step and re-organized their staff to optimally manage their consolidated data center. This results in conflicting processes, redundant products and skill sets that are not aligned with the needs of the applications. As organizations consolidate their IT environment to cut costs, they also need to be as equally diligent in constantly re-organizing their IT department, updating their skill sets and then empowering them to do their jobs so they can actually complete them.
- End the blame game. In the last company I worked at, when anything went wrong, the witch hunt immediately began for who screwed up so blame could be assigned. While I am a big fan of root cause analysis to avoid a repeat of the same problem in the future, when the outcome of the “blame game” always results in someone being fired, it does not take long for individuals to figure that they need to make sure they are not assigned blame. In that last organization I was at, it was common to blame the vendor since it preserved your job and that of your co-worker who might cover your back when the next witch hunt commenced.
A better approach is to identify the root cause of the problem without the constant threat of someone being fired if they are truly the source of the problem. Granted, if someone deliberately causes the problem or seeks to cover it up, that is different issue. But by creating a culture where people (to include your vendors) feel safe to admit that they made an error either accidently or out of ignorance, problems can be addressed more quickly.
- Update and modernize legacy applications. If it ain’t broke, don’t fix it. That philosophy sounds good on the surface as there are some application configurations that just run and have run for years without issue. While this sounds good on the surface, not touching an application for a long period of time eventually comes back to bite an organization. Two examples on why this is problematic:
One individual who worked in the IT department of a railroad company once told me that the railroad has COBOL code running on its mainframe that has run flawlessly for years and no one touches it. The problem? No one in the department knows for sure what source code it is based on or even where that source code is.
Another example. The last company I was at had a similar application that had run flawlessly for years with no one ever taking it offline at any time to do maintenance on it. Finally a mirrored hard drive on the system failed so they had to take it offline for the first time in many, many years to repair the drive. When the system started to reboot, it caught on fire.
Situations like these make it impossible to effectively and successfully centralize and manage backup and recovery. I can almost guarantee you not a single one of the new backup software vendors with a “virtualization-first” focus is ever going to develop a version of their backup software that protects these legacy applications. Only by taking steps to update them can you use a common platform to protect and recover them.
Backup is broken but fixing it is more complicated than just throwing some new technology at the problem. Granted, disk-based target backups, disk-based backup technologies and backup software with a virtualization-first focus certainly help. But to truly “fix” backup, organizations need to honestly examine their internal processes and maybe even their own cultures if they are going to get to the root of their backup problems and solve them.