Why is DR and process documentation so bad?
Given stories of people who have failed DR tests because they fail to document the processes that are required to recover systems, it's clear that the missing link is documentation and process. Why are systems so poorly documented? I see two reasons pop up all the time:
- I was too busy building the system to spend the time to document how I built it.
- I am the only one who knows how this system works; if I document it, and spread that knowledge, my value to the company is decreased, and my job is threatened.
Obviously, the second answer is a thoroughly selfish response, but we're all human, and prone to covering our own backsides. The answer is simply this - if you're the guy who knows stuff, and you spread that knowledge through documentation and training, you become even more valuable to the company. You will still be the de-facto expert on that topic, as is always true of the first entrant to the field; you will also be known as someone who picks up new knowledge - this will allow you to be assigned to new and exciting things within the company.
The first business lesson I learned came when I asked my boss why he had just fired the only guy who knew all about a particular system. His answer surprised me - "because he's the only guy who knows all about that system". The explanation was simple - the guy had been asked several times to share that knowledge and had refused to do so.
Predicting that at some point the employee would leave - or get run over - the manager decided to make it his choice as to when to replace the employee and recover the system from scratch, and when that system hit a predicted low usage period, that's when he chose to relieve the employee of his duties.
I was too busy to document it.
So now back to the first answer - that you're too busy implementing to document how you implemented it.
That's the one I personally have the hardest problem with. Particularly when I'm "trying something out" - trial and error programming or administration - I find that the flow of doing the task gets interrupted, sometimes fatally, by the requirement to document what I tried.
But that's what version control is for. You should never do so many undocumented changes that you can't extract the difference between your original settings and the new ones. Or more specifically, so many undocumented changes that you can't explain the changes you made.
For those moments when you want to trial something without automated version comparison, use a virtual machine for your test case, so that when you have finished, you can document the process when you apply it again to the development system, and so that someone else can follow that process to apply it to the production system.
Management support
Now you just have to ensure that you get support from your management chain to spend the time making changes (to add features or to fix problems) and then spend time documenting the fix.
Often, this is as simple as including time for documentation in your estimates.
Don't be tempted to treat the documentation estimate as "float", so that you eat into the documentation time if the change takes longer than anticipated. That you exceed your original planned time is reason to revise your estimate or reduce your scope. A change that takes longer than anticipated to make will likely also take longer than anticipated to document.