Nowadays, many people think that data science is as simple as most of its evangelists claim, so they end up making some avoidable mistakes in their work. Every data-related profession needs focus and attention, even more so if it involves complex problems as data science does. In this article, we'll explore some of the most common mistakes data scientists make in their day-to-day work and some suggestions as to how you can remedy them. If this article is popular, I may write another one on this topic, exploring additional mistakes data scientists make.
First of all, many data scientists carry the illusion that the world is like Kaggle competitions, where the data is relatively clean and tidy. Simultaneously, the only thing that matters is a performance metric, such as accuracy or mean squared error. Although there is merit in practice through such a competition, data science projects involve much more when you are learning about data models. So, paying attention to data engineering, mainly data cleaning and data exploration, is vital for data science work. Additionally, communicating your findings through a report/presentation and comments or any other text accompanying your code is crucial.
Another mistake many data scientists make is using models without understanding them enough. This superficial knowledge is especially the case with machine learning models, particularly AI-based ones (i.e., various artificial neural networks). Of course, the libraries at your disposal will do most of the weight-lifting, but you still need to know what they are doing, how the various hyper-parameters come into play, and what the outputs mean. Having a solid understanding of how they work under the hood can also be helpful, as it can make troubleshooting more straightforward and more efficient. So, for best results, learn about the models' theory, maybe even code one from scratch yourself, and use them properly.
Moreover, many data scientists don't understand the business side of things as they focus too much on the field's coding and math aspects. As a result, they don't always solve the problems they need to solve since they misinterpret the data project's requirements. This miscommunication or misaction is particularly severe as a mistake if the project has a tight deadline, since revisions may be grossly limited. So, instead of just looking at the technical aspects of data science, you can learn to "speak the business language" better and set more accurate goals for your data science project and corresponding tasks. This refined communication is a valuable transferable skill, by the way.
Finally, going through the motions as if it's a mechanical process, void of curiosity and creativity, can be a deadly mistake. It's not that it will kill you, but it will make the whole experience in the data science field lifeless and uninteresting. It's hard to build a career in it under these circumstances. The mistake is more like a symptom of a deeper problem, namely shallow learning of the field, particularly the mindset. If you view things in data science mechanically, you probably didn't understand it well enough to appreciate it and, to some extent, feel inspired by it. Remedying this mistake involves going into depth in its various methodologies and cultivating a genuine interest in it, starting with a real curiosity about it.
Although not a panacea, learning more about data science and the right mindset behind it can help alleviate most of the mistakes commonly made by data scientists. This knowledge and know-how can also help you lay strong foundations for your data science work and enable you to develop your skill-set effectively and efficiently. So, check out my book "Data Science Mindset, Methodologies, and Misconceptions" and spread the word about it if you are so inclined. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.