How to pass a Disaster Recovery test
Last week I went to a party hosted by MVP, all-round good guy and host of SBSMigration.com, Jeff Middleton.
Jeff hails from New Orleans, so while he is well known for "swing migration" techniques that allow you to move your domain from and to different versions of Windows (often, a small business gets bigger, and wants to migrate from Small Business Server to a Windows Server environment for more expansion possibilities, or just wants to upgrade their domain controller without taking it down for a couple of days to do so), he's obviously quite familiar with good-old-fashioned disaster recovery.
So that reminded me of a number of occasions where I've spoken to IT Professionals who have all shared the same misconception about disaster recovery. Here's a made-up example of the kind of question I mean:
"Alun, my company just plain sucks at disaster recovery testing. Why, every year, we have a DR test, and every year, we fail at something or another. You'd think we'd be passing them by now!"
"Alun, I just don't get it with these DR tests - they've scheduled another test for later this year, and they're doing it in the middle of the week, when our backups aren't synchronised. Don't they realise that'll make us fail the DR test?"
Okay, you can probably guess where I'm headed.
You're not supposed to pass a DR test.
A DR test is about spotting the problems that you will have in a disaster, and documenting them so that you can determine what needs altering in your disaster plan, whether it is to improve your response to a disaster, or to accept that some level of loss of service or data will occur.
A DR test where you succeed in recovering everything hasn't told you anything - except perhaps that your test could have been more rigorously designed.
[A colleague of mine tells me of a disaster recovery test at a company that had been doing well in DR tests largely as a result of the efforts of one talented individual who knew everything. At the start of the test, the DR manager pulls the talented guy aside, and says "bad news, Chuck - you were incapacitated in the disaster, and they'll have to recover the systems from the documentation you left behind."]
And, for those of you working in a corporate environment where it is important to you to expand your fiefdoms and/or justify your budget requests, bear in mind that where there are shortcomings discovered in DR, there too there will be money allocated to fix those shortcomings to prevent a real disaster from becoming a calamity. [But it goes better if you predict some of the failings beforehand, or at least whine about how snowed under you are in those areas.]