In the most venerable of sciences, Physics, there are two closely linked concepts, that of work and that of energy. Work is the result of a force applied over a given distance, while energy is often seen as the result of work. However, energy takes a variety of forms, which enables us to produce work through the use of it, be it through a preexisting form (e.g. uranium and thorium) or some man-made form (e.g. a battery). This fundamental idea of the relationship between work and energy, which we often take for granted, is something that applies to data science as well, by substituting energy for value.
Value is sometimes considered as the 5th V of Big Data (the other four being Volume, Velocity, Variety, and Veracity), something that is quite inaccurate though since value is a fundamental characteristic of information, not a particular kind of data. Information, however, can be found even in relatively small datasets (which were considered large once, before the era of big data), so calling it a characteristic of big data can be misleading. This misconception doesn't take away any value from the idea of value though, which is often a value instilled in many data scientists, particularly those who go beyond the techniques and methods. These data scientists penetrate the essence of the craft, through the development of the data science mindset, which is the most valuable aspect of the field.
Value is something that concerns business people too, however, since it is one of the outcomes of a data science project, which ideally can translate into increased revenue, be it via the development of a new product or by making a business process more efficient. Also, value can enable an organization to expand its scope, know its customers better (KYC), and liaise with other organizations more effectively. This value, which often takes the form of insights, is at the core and oftentimes at the end of the data science pipeline.
Value, however, can take the form of a product, such as an API that automates a particular evaluation process or a prediction. Although the technology behind such a product is nothing spectacular (APIs have existed for a while now and they are fairly straight-forward for a software engineer to develop), the data science part of that product is what brings about the real value in such an API. Without a data science engine behind it, an API is bound to be more of an ETL tool which although still valuable, it's not of the same caliber of data science-powered APIs.
Value in data science is often found in the information distilled from the data, particularly through a predictive analytics model. Elements of it, however, are already encountered in the data discovery stage of the pipeline, where the data scientist evaluates the features at hand and the metadata available. This is often conducted through the creation of data models, which is why it is part of the data modeling part of the pipeline. I talk about all this in detail in the Data Science Modeling Tutorial, available on the O'Reilly (formerly known as Safari) platform.
Value in data science is a big topic and if I were to continue this article would be irksomely long. It would be best if I continue this in another article, or even a series of articles, in the weeks to come. Cheers!
Your comment will be posted after it is approved.
Leave a Reply.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.