RTO vs. RPO: How Much Downtime Are You Willing To Tolerate?
In its Data Recovery column Robert Gast, writing in Computer Technology Review, discusses the building blocks of data recovery, using the following classifications:
-Recovery Time Objective (RTO), i.e., the goal for the maximum time that a recovery operation should ever take.
- Recovery Point Objectives (RPO), i.e., the goal for the maximum amount of data that will ever be lost due to a disaster or other data destroying event.
Interestingly, using these parameters Gast immediately calls out the shortcomings of tape backup in a couple of instances, the most overt being “Tape-based backups typically saddle you with the longest recovery times of all available options.”
In the context of recovery point to describe the completeness of the data that a technology can reasonably be expected to recover data and transactions entered only minutes earlier, tape is likewise “shiskabobbed”:
“If you take nightly backups and immediately ship those backup tapes offsite, ignoring for a moment the possibility of defective tape media, you will likely be able to recover data created or updated up to the time that the last backup tape was created. So the recovery point for that tape-based backup solution will be somewhere between a few minutes and about 24 hours. Similarly, the recovery time expectation for tape is a range. Recovery time depends upon how long it takes to retrieve the tape, mount it, and find and copy all the required data. An exact specification cannot be given for such a process, only a range of best-to-worst-case estimates.”
Getting to the heart of the discussion Gast uses an example of systems running online banking applications to delineate RPO from RTO. According to Gast the bank would likely set an RTO of no more than an hour for these systems, perhaps an hour or less. In addition, because there are no immediate paper backups for the transactions managed by these applications and because the data in this case represents customers’ money, the bank would likely set an RPO of zero—i.e., no lost data—for these systems. Some small delay in access may be allowable, but not data loss. While RTO and RPO vary from company to company and, within companies, from application to application, overall, few companies have a high tolerance for lengthy recovery times for any of their critical systems.
Citing a survey by the Information Availability Institute, Gast calls out that 25.6 percent of the respondents said that their company had a recovery time objective of less than one hour. A further 33.6 percent said that their RTO was two to five hours. A much smaller group, 13.3 percent of respondents, had an RTO of six to 12 hours.
The surveyed companies’ recovery point objectives were also stringent. Fully 39.6 percent of the respondents said that their companies were not willing to accept any data loss due to disasters. A further 16.4 percent said that they would tolerate up to a few minutes worth of data loss. Overall, fully 67 percent of respondents agreed that losing more than an hour of data was absolutely not acceptable.
Ultimately, where does Gast come down on what level you should set your RTOs and RPOs? Not surprisingly the answer is obvious, albeit woefully unrealistic: instant recovery and zero lost data.
Plainly stated, the more hardwired (and firmly entrenched) your RTO and RPO, the more you will have to spend on a recovery system that will meet those objectives. Arriving at that balance, suggests Gast, is highly dependent on a thorough analysis of your cost of downtime and the cost of lost data – costs that you can avoid or mitigate with an appropriate investment in recovery technologies—is necessary to determine which technologies you should acquire and how much you should spend on them in order to optimize ROI.
Oh, and if you use tape for data backup and recovery, your RTO and RPO “mileage” may vary—considerably.
Comments
Comments are currently closed.