Although transparency is often viewed in relation to predictive analytics models, when it comes to data science there is another aspect of transparency that is also particularly important: the transparency of the data science work. This has to do mainly with how transparent the process followed is, as well as the models used and the results.
Nowadays it’s easy (perhaps too easy!) to build a predictive analytics model in complete obscurity, thanks to the wonder of deep learning. This way, you may be able to bring about a satisfactory result, without gaining a sufficient understanding of the data at play, or the quirks of the problem at hand. Of course, it's not just the data scientist to blame for this reckless behavior. Far from it. The root of the problem is managerial since we often forget that the data scientist will tend to follow the most economical course of action, to deliver a result, be it a data product or a set of insights, in the least amount of time. This is often due to the strict deadlines involved in a data science project and the all too frequent lack of understanding of the field, by the people managing the project.
There is more to a successful project than a high accuracy rate or an easily accessible model on the cloud. Oftentimes the problems tackled by data science are complex and have lots of peculiarities that deserve close attention if the problems are to be solved properly. Anyone can build a predictive analytics model nowadays, without having a good grasp of data science, thanks to all these 10-12 week boot camps that offer the most superficial knowledge humanly possible to the aspiring data scientists! Yet, if our expectations of the data scientists are equally shallow and we are willing to up with opaque models and pipelines, then we reap what we saw. That's why it's important to have good communication about these matters, going beyond the basics. Mentoring can also be a priceless aid in all this.
Fixing this fundamental issue requires more than just good communication and mentoring, however. We also need to opt for a transparent approach to data science. All aspects of the pipeline need to be explainable, even if the models used are black boxes, due to the performance requirements involved. The data scientists need to be able to communicate their work and findings, while we as managers need to do the same when it comes to requirements, domain knowledge, and other factors that may play a role in the project at hand. All this may not solve every issue with today’s obscure data science pipelines, but it is a good place to start.
Perhaps if we have transparency as a key value in our data science teams, we have a better chance of deriving true insights from the data available and bring about a more valuable result overall.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.