Preparing for Crash Scenarios
Murphy’s Law posits that anything that can go wrong will go wrong. And like all laws that have been around for a while, Murphy’s Law has been expanded on a great deal. For instance, there’s MacGillicuddy’s Corollary, which states that anything that goes wrong will take place at the most inopportune time. There’s even a philosophical addendum for Murphy’s Law that goes, “Smile…tomorrow will be worse.”
Many of the people who work with databases follow this pessimistic dictum religiously, as well they should. Their organizations’ information is typically much more valuable than the technologies it runs on. Software and hardware are the servants of data, not the masters of it. Unfortunately, these systems—and the employees who use them every day—are far from perfect, and occasionally fumble that highly valued information. Hence, the database professional’s answer to Murphy’s Law: “Back up early and often.”
Because lost data can cost your company a lot of money and could even cost you your job, you need to prepare for each and every contingency for databases. Here are a few things to keep in mind:
Consider Everything That Could Go Wrong
Obviously, there are some reasons for database failure that are more common than others. You’re much more liable to have a collapse as a result of something like hardware snafu or operating system crash. Still, these shouldn’t be your only area of focus. After all, it very well could be the threat you didn’t consider that winds up getting you.
Physical hazards can cause problems too, such as power outages and actual damage to the brick-and-mortar facilities that house databases. And no part of the world is really entirely safe, either. Take the United States, for example: The Western states have earthquakes, mudslides and forest fires; Florida, the East Coast and the Gulf Coast have hurricanes; and the South and Midwest have tornados and floods.
Staying true to Murphy’s Law, you should consider every possible crash scenario, not just the ones that might involve a technology breakdown or malfunction. The perils you plan for might not happen this week, or even this year—hopefully, they’ll never occur at all. Can you afford to take that chance, though?
Utilize an Array of Strategies
There is no silver-bullet response to every kind of database crash, so—to mix a couple of metaphors—you should avoid putting all of your data eggs in one basket. Instead, use a combination of technologies and techniques to meet as many challenges as possible.
To cite a specific example, certain crash scenarios may require a complete recovery, which would include the application of every essential redo or incremental backup ever generated for the type of the database being retrieved. Other crashes might be better suited for called point-in-time recovery, which produces a version of the database as it was at some particular point in the past.
Back That Thang Up
Again, this comes back to the DBA’s codicil to Murphy’s Law. However, database pros need to devise a specific database strategy that preserves information but does not hinder the operation of the organization. Thus, you should come up with stages of backups based on time increments such as days, weeks and months. This arrangement should not be developed in a vacuum, but rather through frequent consultation with the company’s leaders and end users.
Some things to ask yourself and others as you work out a database backup plan for your organization include the following:
- What is the frequency of data transactions?
- Are there any times when the system can be shut down for cold backups, or will we have to back up the system while it’s online?
- How valuable is each data object?
- What information can we send off-site or simply delete?
Testing 1, 2, 3
Finally, to be sure that all of your backup and recovery strategies actually work, frequently try out their capability and efficiency in a test environment. This is not just for the sake of the technologies being employed, but also for the functional processes and communications procedures you have in place. A good number of failed recoveries come as a result of organizational blunders and communication breakdowns.
Running tests often can help you resolve minor glitches before they cause major problems after a database crash. Also, it can help you and your fellow database staff familiarize yourselves with the system so that when the real thing comes, you’ll be composed, focused and ready to take on whatever fate throws at you.
–Brian Summerfield, firstname.lastname@example.org