“There are two kinds of network administrators in this world: Those who back up properly, and those who will.” These words from my former mentor still ring true—most sites I have audited have not had a quick recovery method in place. More than one of the following major procedures were not implemented:
- UPSs configured to shut down attached servers automatically (not a problem with backup generators).
- Centralized, up-to-date list of every server, its role and brief configuration information.
- Up-to-date ER disks for every Microsoft server (or restore points for 2003); on Novell, autoexec and startup.ncf copies, etc.
- Daily audit of every server’s backup logs and correction when necessary.
- Periodic test of restore from backup tapes.
- Site-specific list of restore procedures for every server type.
- An up-to-date phone tree to contact all IT personnel, assembly areas, priority vendor agreements, etc.
From the above, we see that disaster recovery can span the holistic (how do we get our people to a rebuilt data center?) to the simplistic (how am I going to rebuild this single server before users rebel?). My focus is on rebuilding single servers quickly, assuming that the facility is intact. In my last column, I discussed some imaging products. If you have the storage capacity, such as a large SAN, you may store a full image or two from every server. With incremental images, you may store at least a week’s worth of data. With limited online storage capacity, what can you do to ensure rapid rebuilds?
Most tape backup products have disaster recovery options that restore images. This saves considerable time: Loading and patching the OS, the tape backup software, then restoring from tape can take from a few hours to all day and night, depending on the tape speed and capacity. Why does the IT staff not implement these options? Time spent planning for disaster is recovered when that one improbable sequence of events occurs. I looked at options from CA’s BrightStor line and UltraBac’s UBDR Pro. Two other products, DeTroubler from Blackbird Group for eDirectory and Aelita’s Recovery Manager for Active Directory permit recovery of single deleted directory objects, as well as other directory recovery options short of full restores. Any one-button restore requires a bootable tape drive, limited to certain manufacturers in the Wintel world.
Another problem is that you may not have the same hardware to restore your tape or image. Bare metal restore is not a problem for Novell because the only hardware-specific information necessary—disk drivers—can be accessed before booting. Microsoft does not recommend restoring to different hardware. I found that if the initial rebuild includes the correct drivers, going to safe mode (F8) after tape restore usually loads compatible generic drivers so you can load critical corrected video and storage drivers. In some cases, you may use F6 to load storage drivers, after doing an image restore.
UltraBac combines a comparatively lean Microsoft backup with a variety of DR options and only splits the Registry into users and local machine hives. UBDR Pro, a separate CD, boots the computer into a dedicated, locked XP environment for the UltraBac GUI. I used the GUI environment to restore my image. By checking single file restore during creation of an image, I could restore a single file.
CA’s BrightStor Enterprise Backup also has DR options, though you have to create the boot CD. However, the DR option saves machine-specific information to an alternate server, so any server backed up to an Enterprise server can be recovered even after disaster. Enterprise uses the same tape format as ARCServe, but has extended media vaulting and serverless backup capabilities.
Using DeTroubler, I was able to restore single objects and a partial tree that I deleted. DeTroubler also backs up and restores hidden objects: KMO (Key Material Object), PKI key pair, Novell Secret Store and so on. The adoptive restore ignores attributes not present, so I could restore a tree to a lab machine, though other links and references were unavailable, and could allow object restore even though schema extensions are unavailable.
Aelita’s ER Disk is a very useful administrator favorite. It creates restore points for backups of system state, protected files, cluster quorum, IIS metabase, boot files, etc. Aelita’s Recovery Manager for AD additionally permits coherent group policy restores.
Tape backup and DR products are never “sexy.” But successful, efficient disaster recovery is just what is implied: planning. Given these excellent products, you should be able to restore single servers in 10 minutes to an hour or two. Remember, a failure to plan is a plan to fail.
Douglas Mechaber is a network engineer and architect who enjoys finding new ways to make his work easier. He gratefully acknowledges the contributions of Rory and Mark of a large Southern California pharmaceutical company, for discussion of the problems of bare metal restore. Share your worst DR or backup snafu with Doug at firstname.lastname@example.org.