This post is inspired by Joel Grus’s latest blog post on an interview of his and his showcasing of his Tensorflow know-how during it. Now, this interview was probably imaginary since I doubt anyone would be that foolish in an interview, but he is not afraid to make fun of himself to get a point across, something that is evident in most of his writings (including his book, Data Science from Scratch). I have no intention to promote him, but I find that his whole approach to data science is very fox-like, so it is only natural that he is mentioned in my blog!
In this dialectic post of his, Joel describes an interviewee’s efforts to come across as knowledgeable and technically adept, as he tries to solve the Fizz Buzz problem he is asked to whiteboard, using this Deep Learning package, along with Python. Of course, why someone would waste time asking him to solve such a simple problem is incomprehensible to me, but perhaps it’s for an entry level position or something. Still, if this were a data science position, the Fizz Buzz problem would be highly inappropriate as it has nothing to do with programming that’s relevant to data science. Joel goes on in his blog post to describe the great lengths he has to go to in order to get a basic neural network trained and deployed so that he can solve the problem, though even though he does nothing wrong (technically), his approach fails to yield the desired output and he fails the interview. That’s not to say that he or his tools are bad, but clearly illustrates the point he’s trying to make: advanced techniques don’t make one a good data scientist!
This is an issue with many data scientists today who have gotten intoxicated with the latest and greatest A.I. tech that’s found its way into data science. The tech itself is great and the tools it has been implemented with are also great. However, just because you can use them, it doesn’t make you a good data scientist. So what gives? Well, even though Deep Learning is a great framework for tackling tough data science problems, it fails miserably in the simpler ones, which are also quite common. Perhaps it’s the lack of data points, the fact that it takes a while to configure properly, or some other reason that depends on the problem at hand. Whatever the case, as data scientists we ought to be pragmatic and hands-on. Just because we know an advanced Machine Learning technique, it doesn’t mean that we should use it to solve all of the problems we are asked to solve. Sometimes we just need to come up with some simple heuristic and work with that.
There is an old saying that illustrates this issue the Joel describes in that post: killing a mosquito with a cannon. Yes, you may actually succeed in killing the poor insect with your fancy artillery weapon, but is that really cost-effective? Nowadays many data scientists go with the Deep Learning option because someone convinced them that it’s the best option out there in general, without sitting down for a minute and thinking if it’s the best option for the particular problem they are facing. Data science is not as simple and straight-forward an approach to problem-solving as some people make it out to be. So let’s get real for a minute and tackle problems like engineers, opting for a simple solution that works, before calling the cavalry for A.I. to help us. Being super adept may be appealing, but we first need to be adept at what we do by employing a down-to-earth approach that just works, before opting for improvements through more advanced models.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy flair when it comes to technology, technique, and tests.