Houston, We Have A Problem
- Date: 6 July 2011
- Author: broyer
- Category: News, Online Backup, Services, Virtualization
A storage problem, that is, of ginormous proportions.
Reading through the latest issue of Computerworld I came across a blogger by the name of Ben Golub whose article “Enough data to fill a stack of DVDs to the moon (and back)” really caught my attention.
I think it goes back to the unquenchable thirst, our will to reason that we as humans have to compartmentalize sophisticated measures of time and distance, sometimes even mass quantities, so any average human being can understand it in more universal terms. I imagine the source for most of this goes back to our school days when teachers were always asking you things like trying to figure out far the moon was from the earth by the number of books you would need to stack in a classroom to reach the ceiling—and then trebling that number again and again until it reached the atmosphere…or something like that anyway. Or a fun fact like the surface of the moon — with a diameter of 2,000 miles — covers about the same area as the continent of Africa. As I said, plainly stated metaphors easily digested by the human mind.
Golub does this exceptionally well to build his case that, as a species, we’re generating data by the truckload, and moreover, that data is of a fundamentally different character than the data problems of only a decade ago. His examples include:
- The entire works of William Shakespeare represent about 5MB of data, enabling 1,000 copies of his works to comfortably fit on a single DVD. The text in all the books housed in the Library of Congress would fit cleanly on a stack of DVDs the height of a single-story house.
- In 2010 we created enough digital data to fill a stack of DVDs that would stretch from Earth to the moon and back, with the amount of unstructured data to be created in 2011 expected to be about 60 percent greater than even last year. Golub suggests that according to some estimates, the “stack of DVDs” will reach Mars before this decade is done.
- More than 13 million hours of video were uploaded to YouTube during 2010 and 35 hours of video are uploaded every minute while nuclear physics experiments going on at the CERN Collider on the border between France and Switzerland generate 40 terabytes every second.
Golub contends that during the course of its technology maturity, storage evolved under the aegis of a single large database. This transactional database, typically 500 GB in size and representative, say, of a large bank’s transactions and data storehouse, consisted of highly structured rows and columns where losing or corrupting even a byte of data would be huge, resulting in enterprises spending almost anything to protect that data. Block-based, structured data followed soon after and were delivered as highly reliable, highly expensive, proprietary—and here’s the key—monolithic solutions for which storage manufacturers both hold the fiddle as well as call the tune (e.g. charging substantial premiums for them).
In fact, that term “monolithic” reminds me of its namesake in 2001: A Space Odyssey, where scientists were desperately trying to discern the monolith’s meaning, unable to separate its matter or molecular structure into more easily consumptive (and perhaps even more revealing) components. Again, using a contemporary metaphor, backing up and restoring only the data you want to, rather than all of it and trying to find just a single slice, packet or byte in time. Sounds like the perfect use case for data deduplication.
How times have changed! The dynamic has shifted considerably from structured to unstructured data such as video images and consumer-generated data such as music or photos, which makes quick work of a 500 GB database, sometimes in a matter of days. As Golub argues, “While this unstructured data is important, and demands good performance and availability, it is simply unfeasible to spend the same amount per GB to support a consumer’s video trove as it is to support a bank’s transactional history.”
Golub suggests that although the price of networked storage has dropped by as much as 15 percent per year, it has been unable to keep pace with Moore’s Law, much less the aforementioned 60 percent growth in the amount of data being created. For an organization experiencing these trends just to keep pace the storage budget for new data alone would need to grow 45% or more per year, likely tripling at the end of five years.
Golub’s article includes a matrix of storage demands and costs past present and future with the promise that storage itself is badly out of line with the demands made of it, and the promise of future posts describing what he sees as an upcoming “revolution” in storage.
What do you think? Is the demand for storage within your organization exceeding the space you’ve reserved for it? Are you on the verge of outstripping that capacity? Let us know.
(and if you have any metaphors we can borrow to describe your storage predicament so we can better understand and take stock of it, feel free to share that too).
Comments
Comments are currently closed.