Contrary to what many people will have you believe, imagination is not just for researchers and PMs, when it comes to data science. Every single aspect of a scientific project has some imagination in it, even the most mundane and straight-forward parts. As data science involves a lot of creativity, at least for the time being, it is not far-fetched to presume that imagination has an important role to play in the field.
By the term imagination I mean the conscious use of the mind for projecting new forms or perceiving forms that could be, but are not manifested. It is very different from the unconscious use of the mind, which is what psychologists refer to as fantasy, a fairly futile endeavor that frequents the undisciplined and immature minds. As data scientists we often need to see what is not there and create it if it’s useful, or find some way to deal with it if it’s a potential issue.
Of course there are those hard-core Deep Learning (DL) people out there who believe that with a good enough DL network you don’t really need to worry about all this matter. They advocate the idea that A.I. can take care of all this through the systematic and/or stochastic handling of all the possibilities in a feature set, yielding the optimum collection of features that it will then use for the task at hand. Although there is no doubt that a good enough ANN can do all that, it still doesn’t solve all the potential issues, nor does it make the role of a human being unnecessary. Just like a good motorbike can alleviate a lot of the hard work required for getting from A to B, it still doesn’t eliminate the need for someone who steers the vehicle and keeps it safely on the road.
Imagination is our navigator in many projects and although it often lends itself to feature engineering and other data engineering tasks, it is also useful for something else that no A.I. has managed to achieve yet: the development of hypotheses and a plan of action based on the data at hand. Data science is not all about getting a model working and coming up with some good score in a performance metric. This is just one aspect of it, the one that Kaggle focuses on, in order to make its competitions more appealing. However, a large part of the data science work involves exploring the data and figuring out what kind of insights it can yield. A robust A.I. system can be an invaluable aid in all that, but we cannot outsource this task to it, no matter how many GPUs we use or how slick the training algorithms the system employs. Just like an organization cannot function properly if its members are all complete imbecils (the components an automated DS system comprises of), a DS project needs some higher intelligence too (the equivalent of a competent manager in the aforementioned organization).
We need to set goals in our project and foresee potential problems and opportunities, before we come to that part of the pipeline, otherwise we risk having to go back and forth, wasting valuable resources. So, even though focused and meticulous work is essential, being able to step back and see the bigger picture is equally important. That’s why oftentimes a data science endeavor is handled by a team of professional, with the data science lead undertaking that role. So, if you want to make things happen in data science, something that an A.I. is unable to undertake, you need to use imagination. The latter, along with the systematic aspects of the role, can lead you to the desired outcome of your data science project, be it insights or a data product. Imagine that!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.