McKinsey in “Big Data: The next frontier for innovation, competition, and productivity” reports that “The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future”.
What is Big Data? Do you need it? What do you have to do to get it? Do you already have it?
Gartner defines Big Data this way “Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
This pretty much answers the next two questions. You need Big Data if the decisions you make require more data or require the results of analysis of more data. To get it, you need to collect it.
How you collect it is not part of the definition, so logging data at tasks, as tasks are being performed is one good, obvious way. It costs money to do this so, go easy – don’t collect data that is not needed to make decisions (current or future) because it could slow down task performance not to mention generating demand for processing, storing, analyzing and reporting on such data.
It’s worth reading the Gartner definition over again . . . Clearly data must be timely in order to be useful for making decisions, accordingly, “new forms of processing” implies infrastructure upgrades to address volume/velocity, but then we see “. . . . high variety of information assets”.
I take this to be a reference to the information content. Whereas we used to ship/receive and collect mostly fixed-field data in relational database management system (RDBMS) tables, today we have “Cases” that need to be able to accommodate fixed-format data, files, processed data, images, video/audio recordings and more.
Anyone doing a search for “supplier address” in an RDBMS would not expect to find a supplier address in other than the “supplier address” field. In an environment that accommodates traditional structured as well as unstructured data, you might find “supplier address” almost anywhere. This makes it necessary to index all significant data so that, wherever it is, it can be found.
This brings us to the last question i.e. do you already have it?
It depends, is the answer. A large bank with millions of customers has Big Data relative to a large investment company whose clients are retirees and risk-adverse.
I suspect that I have Big Data. One of my night jobs is to produce archive video recordings of stage events, each of which lasts about 1 ½ hours and generates about 13 Gb of HD video per hour. Editing and backups about doubles this number.
I have a computer that is somewhat optimized for video editing/disc generation. At the time of this writing, the computer has three drives at 3 Tb each plus one internal 1 Tb drive. All are full.
Forces are hard at work trying to drag videographers to 4K recordings from the current 1920x 1080p HD format. Anything to sell more cameras and higher definition TVs.
If these forces succeed, storage requirements for video will increase by a factor of four and more people will have Big Data.