How long can we keep gathering and storing data about everything?
This feature first appeared in the Summer 2017 issue of Certification Magazine. Click here to get your own print or digital copy.
According to our best records, it took from the dawn of humankind until A.D. 1804 for the global human population to reach one billion people. From there, it took a mere 123 years to add another billion people. In the 90 years since hitting that milestone, we have added another 5.5 billion people to the planet, an average of 1 billion people every 13 years since 1960.
The population explosion of the last two centuries has irrevocably shaped the current nature of our world, including the rapid development of modern computing technology. In particular, our burgeoning population combined with the rapid advancement of information systems has resulted in the generation and collection of massive amounts of data — an immense ocean of raw numbers related to practically every type of activity on our planet.
And we do mean massive amounts. On its What is Big Data web page (http://www-01.ibm.com/software/ data/bigdata/what-is-big-data.html), IBM claims we are currently generating 2.5 quintillion bytes of data every single day. What’s more, 90 percent of the data that exists today has been created in just the last two years.
This increase in data generation will be fed by new technologies as they reach critical mass. One example: At the Microsoft Build Conference in May, CEO Satya Nadella told attendees that a fully automated self-driving vehicle will typically generate 100 GB of data per second.
Humanity’s ability to capture and store gigantic amounts of data is impressive — but is this activity serving a purpose beyond proving it can be done? Are there benefits to mass data collection, or is it just a technology-enabled form of hoarding?
Home storage technology grows and shrinks
Many software programs have a toolbar button you click when you want to save what you’re working on, be it a document, photo, or other item. This button is a picture of a 3.5-inch floppy disk on it, a once popular storage format that dominated the 1980s and 1990s.
In its heyday, the double-sided, high-density 3.5-inch floppy disk held 1.44 megabytes of data. Compare that to a current 64-gigabyte USB thumb drive — which can hold over 45,500 floppy disks worth of data, and costs less than $40.
Today’s home computer users have access to vast amounts of data storage at remarkably cut-rate prices. Data storage media has consistently grown in capacity while shrinking in size and dropping in cost. If memory cards or thumb drives don’t provide enough storage, there are consumer-level network attached storage (NAS) devices that house multiple hard disks and can easily provide several terabytes of space for data storage.
Beyond physical disks, home users are able to get free or paid access to hundreds of gigabytes of online cloud storage from vendors like Microsoft, Dropbox, and Google.
This wealth of data storage options has made things much easier for the average home user. You no longer have to spend a fortune to keep an extensive digital movie and music library on hand, or to have thousands of photos stored in the cloud. And, you rarely have to agonize over which files to delete in order to make room for newer, more desirable items — a conundrum that was standard operating procedure back in the day of hard drives measured in megabytes.
Today’s consumer storage solutions are like having your own virtual industrial park filled with empty warehouses you can keep moving stuff into, with little fear of running out of space.
The purpose and value of Big Data
As noted earlier, human beings are currently generating more data than at any other time in history. This is not just due to population growth — technology has made it possible to capture practically every aspect of our lives. Nearly every action you take that even tangentially involves electronics is potentially being recorded and stored as digital information, which could end up being uploaded to one or more data warehouses.
This may sound like conspiracy theory talk, but people who live in cities effectively exist in a data-collection bubble. Urban dwellers going about their daily lives are surrounded by a multitude of sensors that collect visual, aural, and physical data 24 hours a day.
Your mobile phone — depending on its configuration — will aggressively gather information about your location and activities, data which it interprets (often by sending information to a secondary location) in order to personalize your user experience.
Most people access and use their stored data in bits and pieces. They run a program, play a game, look at photo albums, watch a movie, or listen to music. The desired result for home users with large amounts of data stored is availability — the ability to access the files they want at any given moment.
When it comes to corporations, governments, and science institutions, the motives for professional data hoarding are often murkier. It’s clear that these groups are interested in gathering as much data as they can, even if it is only tangentially related to their prescribed activities. This enables them to apply the tools and practices of Big Data to convert portions of massive data sets into information that can be used to create strategies and achieve specific objectives.
How valuable is all of this data? A social scientist might say that the analysis of huge amounts of human behavioral data can help engineers to create more efficient cities, or provide law enforcement agencies with optimized strategies for crime prevention.
A government agency might claim that keeping these same types of data warehouses makes the country safer from the threat of terror attacks. A corporation would likely say that every bit of customer data it gathers makes it better at serving consumers.
These are fair and somewhat useful answers. But beyond these generous motivations, there are more troubling questions concerning the purpose and value of Big Data. For instance, let’s go back to IBM’s claim that we are generating 2.5 quintillion bytes of data every day. You might look at this claim and wonder:
● What exactly does this data consist of?
● How is this data being captured?
● Where is this data being sent for storage and analysis?
● What level of access should a given individual have to the data captured from his or her daily life?
● Does living in a technology-enabled world mean we have given our tacit consent to being monitored, with the data created from our activities being stored as a possible future asset?
Is Big Data sustainable?
Perhaps the most relevant and timely question concerning Big Data is this: How long can this level of data collection and storage remain sustainable?
If the human population continues along an exponential growth path, it is not a stretch to foresee the total amount of data generated in the world eventually outstripping the total available data storage infrastructure.
International Data Corp (IDC) has put some numbers together concerning this issue, and they are not promising. IDC’s prediction is that total global storage capacity compared to the amount of data being generated will decline by 50 percent by the year 2020.
There are possible technology breakthroughs which could extend the end of available storage. One such technology, heat-assisted magnetic recording (HAMR), could potentially increase the top capacity of hard disks by up to 100x. Seagate is reportedly preparing a 16 TB HAMR hard drive for release in 2018. That’s a 33 percent increase over Western Digital’s helium-filled 12 TB hard drive, which was just released in April and represents the current state of the art in storage technology.
Still, any storage technology improvements will likely pale in comparison to the additional rapid growth of data generation from two key sources: The Internet of Things (IoT) and the continuing proliferation of mobile devices across the world.
Think of your home 20 years ago. How many internet-connected devices were in your home? Maybe a single desktop PC. Maybe that plus a laptop. Today’s average family home can easily have a dozen devices online simultaneously. Laptops, smartphones, tablets, videogame consoles, media streaming boxes, smart televisions, voice-enabled personal assistants, networked security cameras, smart thermostats, digital lighting systems — the list goes on, and will only continue to grow over time.
The future of Big Data
It was Jeff Goldblum in the film Jurassic Park who delivered a line that had many people in the audience nodding their heads:
“(Y)our scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”
Just because humanity is generating quintillions of bytes of data every day, doesn’t mean there is an automatic benefit realized by capturing and storing all of it. There are, however, clear and legitimate privacy and consent issues involved with making every object in the world an internet-connected sensor.
As we continue to dedicate more time and energy into the “arms race” between Big Data technologies and the growing global data volume, perhaps a few questions are in order. Maybe all of the interested parties should take a moment to stop thinking about simply whether mass data collection and storage can be accomplished. Perhaps it’s time for a conversation about whether it should be accomplished.