Many people argue that data science’s main purpose, particularly in a business setting, is to mine and deliver insights. Contrary to data products (which is another data science deliverable type), insights are fairly straight-forward and require little software development (something often outsourced to the dev team). However, their value is something that is the subject of debate, since few insights are actually used in practice, in real-world projects.
An insight is generally some non-trivial conclusion that stems from rigorous analysis of a data stream, be it with A.I. techniques (e.g. a deep learning network), some other machine learning methodology (e.g. an unsupervised learning system), or even some statistical process (e.g. a chi-square test). By definition, it is not something that you can pinpoint by just plotting the data, or calculating some superficial metric, like the mean, or standard deviation (which are fine by themselves, but insufficient for generating useful insights).
It would be good to differentiate between the various aspects of the value of an insight. First of all, there is the inherent value of the insight. This is in essence a signal in the data analyzed, or some interpretation of it. This kind of value is useful primarily for the data scientist and other people involved in the project, in a hands-on way. If the data science project is related to research, this kind of insight can be the basis of a publication. However, an insight that has merely innate value is often not enough.
Another aspect of the value of an insight is its commercial application. This is significantly more important for the majority of data science project. The reason is that someone is paying for the project and it’s this kind of valuable insights that eventually bring about a positive ROI for the project. The data scientist may not necessarily value the commercial aspect of the insights he delivers, but the project manager definitely does, as well as other stakeholders of the project.
Finally, there is the practical value of the insight. Whether the insight has commercial value or not, it may enable the development of something tangible, like a data product, or some in-depth understanding of the problem at hand. This kind of value is conducive to a new cycle in the data science process, something that is bound to bring about new insights, yielding additional value.
Whatever the value of the insights, it is important to remember that one’s work shouldn’t be judged entirely by them. Surely it’s great if you can produce something actionable, or something that sheds light to the problem investigated, but if the data streams available are as noisy as the screen of a TV that’s not tuned to a network, then there is not much you can do with them. After all, the rule that many software developers have “garbage in, garbage out” (GIGO) is applicable to data science as well. If you want valuable insights, you need data streams that have some useful signal(s) in them, otherwise you are just wasting your time.
What are your insights on this matter?
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.