Data De-duplication as Storage Savior
- Date: 25 May 2010
- Author: broyer
- Category: Breakthroughs, Online Backup
Behzad Behtash is rapidly becoming one of my favorite IT columnists at Information Week. His incisive writing and methodical approach to building out his largely agnostic vision of emerging trends in IT, is particularly compelling.
Case in point: his most recent column, “Expanding Role of Data Deduplication.” Data deduplication, especially in the context of primary storage, has found both traction and value in contemporary IT environments. As Behtash discovered, of the 400+ business technology professionals surveyed for the publication’s pulse on data deduplication, more than half manage more than 10 TB of data, compared with just 10% who control less than 1 TB. He observes that just one year ago 25% of respondents to the same survey managed less than 1 TB of data. The uplift responsible for this massive change include, according to Behtash, all of the “usual suspects” such as enterprise databases and data warehouse applications, as well as e-mail. Never mind the implementation of the HITECH Act, which aims to have up to 90% of healthcare providers in the United States using electronic medical records by 2020.
What may not be as well-known, albeit equal in value to deduplication in primary storage instances, is data deduplication in the context of secondary storage (in our case, online backup) which looks for repeating patterns of data at the block and bit-levels. As Behtash explains, when multiple instances of the same pattern are discovered during backup, the system stores a single copy of the data. After this initial backup, only changed blocks are backed up and written to disk during subsequent jobs, consuming significantly less storage and improving overall efficiency and disaster recovery options. Applied correctly, compression rates of “deduped” data of 30 to 1 or even higher aren’t uncommon making data deduplication at least “on par” with data deduplication when used it’s used in association with primary storage.
Behtash concludes this discussion incredulous, that given its advantages in backup, replication and disaster recovery, only 24% of survey respondents use data deduplication currently with 44% of them having no plans for it or saying they won’t use it. I can’t hope to speak for any IT manager trying to backup 10 TB of data on a regular basis but it seems to me any chance of reducing both data backup time as well as primary or secondary storage space by as much as a 30 to1 ratio is at least worthy of a cursory glance, if not in-depth assessment.
Comments
Leave A Comment