People like to talk about the V’s of big data, since it is a topic comprehensive to almost everyone, while it also provides insight regarding the benefits of using data science in an organization. Naturally, these benefits are linked to having access to various data streams, usually resulting to massive amounts of data, and usually referred to as big data. Not everyone agrees as to what V’s are valid for characterizing this valuable resource (some say it’s 4, others exclude Veracity, while other include a couple of others too). However, there seems to be a consensus about the last V, namely Value. Nevertheless, whether there is value in big data or not is something that remains to be determined, since not all big data is created equal.
The issue with the V of value is that it’s not inherent in the data. If that were the case, someone could just buy this data (or license it) and then automatically improve his organization’s ROI. The value of big data is actually something that stems from data science’s transformation of this data into insights and/or data products. The same data that would otherwise be gathering dust on some computer cluster somewhere is turned into something people can use and oftentimes monetize, through data science. This is something that takes effort, however, and most importantly, requires a certain quality in the data to begin with.
It’s often useful to think of data as a gold mine. After all, just because it has the potential of yielding large amounts of the valuable metal, it doesn't mean that it will. Perhaps the mine is all dried up, or doesn’t have much gold to begin with. No amount of data science can remedy that. Data science can yield something of value if there is something in the data that could be of value. Many time people forget that, just like the people who buy a gold mine and expect that they’ll be swimming in gold soon enough.
The V’s of big data, on the other hand, are something real and present in every data stream that qualifies as big data. In fact, they are more like characteristics of the data itself, rather than something dependent on data science. However, the V’s themselves may provide some insight as to how much of big data the data at hand is, but not much regarding its potential for an organization. For example, big data of high veracity that’s related to people’s views on a particular commercial product may be completely useless to an organization that is all about some service. The data itself is fine, but doesn't add value to the organization.
So, in order for big data to be of actual value, we need certain things to be in place. First of all, the data needs to be handled by a data science team (or a single data scientist, if he’s competent enough). Moreover, it needs to have some affinity to the organization’s domain. Finally, there needs to be something insightful in the data, which can be surfaced through a data science project, be it through a better understanding of a situation or through a data product that the organization can use.
In conclusion, the fact that some data stream can offer value doesn't necessarily mean that it will. After the data science team has done its part, the stakeholders of the project need to take action, utilizing the insights and/or the data product developed. People sometimes forget that and neglect leveraging the benefits of a data science project to the fullest extent, much like a gold miner may obtain the gold from a mine, but never get around to doing anything useful with it...
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.