Just like other fields, data science has evolved over the past few years. One of the most evident aspects of this evolution is that data scientists are found in teams nowadays. Even consultancies are often team-based, enabling them to undertake a whole project flexibly and efficiently. But how do we build a data science team exactly? First, we need to look at the different types of data scientists and explore the different specialization levels such a professional may have.
Nowadays, there are several types of data scientists. The most important of them are the data engineering (delving into low-level tasks, such as ETL and handling any cloud-related operations) and the data modeling expert (usually referred to as just data scientist or machine learning expert when it's more specialized). Additionally, there are the data visualization expert, the data science manager, and the data communicator (a more niche role that's not as widely spread). Of course, depending on the data science area that a data scientist specializes in, there is also the NLP expert, the A.I. expert, etc. So, it's safe to say that the data scientist role is quite diverse these days.
Speaking of specialization, that's a topic on its own that plays a role in data science work. The specialist is the most common scenario, whereby a data scientist is really good at one particular task and fairly mediocre in other tasks not related to that task. On the other hand, a generalist is quite decent in various tasks but not particularly good at any specific task. Such a person may be a good team leader, but wouldn't be ideal for tackling a particularly challenging problem. Beyond these two, there is also the versatilist, who is quite good at one (or more) tasks but also quite decent in other tasks. It's like a combination of a specialist and a generalist, making an excellent asset in a team, especially in data science work.
So, how do we go about building a data science team? The team's specifics always depend on the project at hand, but in general terms, you can build a team as follows. For starters, you need to get a versatilist or experienced generalist as the team leader. This person can help build the team by finding professionals with a similar working style and cultural fit. Having a second generalist or versatilist may also be useful, depending on the size of the team. Additionally, you can have two or three specialists, one of whom would need to be a data engineer. If your team needs to work with clients directly, you may need to consider having a data communicator. Also, if the team's expected outcomes are more geared towards dashboards and graphics, you may need to have a data visualization expert onboard.
Should you wish to learn more about this topic and other organizational aspects of the data science field, you can check out the Data Scientist Bedside Manner book I co-authored last year. This book examines various aspects of data science work, focusing on the non-technical ones and various useful tips as to how you can improve your data science career. Check it out when you have the chance. Cheers!
As you may have noticed, data analytics is always evolving as a field, so it's not surprising to see data science changes from year to year. What was hot and trendy in 2020 may not be as prominent soon and vice versa. That's not to say that you should expect to see drastic changes in 2021, but it's good to adapt your expectations, taking into account the latest trends. In this article, we'll explore all that and see how you can benefit from these insights for your career in data science.
It's no secret that deep learning is gaining even more popularity, particularly in time series analysis. So, RNNs are bound to rise in demand as a skill, particularly if you are involved in the field's forecasting part. Additionally, healthcare seems to be becoming more aligned with this tech, so it is expected that more medicine-related organizations are going to be looking for data scientists to join their ranks. What's more, IoT is expected to incorporate AI, making our work more relevant to infrastructure projects. Moreover, we should be expecting Reinforcement Learning (RL) to grow further as to use cases of it, such as chat-bots, are growing in popularity. Finally, it seems that more and more people are becoming aware of data science and AI's benefits, so it's easier than ever to make the business case for advanced analytics in a company. Simultaneously, the cloud provides a viable solution for the hardware required, something that's bound to stick in the coming years.
Based on the above, it’s reasonable to deduce that the data science specializations more likely to be more relevant this coming year are AI expert, data engineer (particularly the one geared towards machine learning, aka, machine learning engineer), and those data scientists with domain knowledge in healthcare and IoT. Naturally, Natural Language Processing experts are bound to remain in demand, particularly if they possess chat-bot know-how.
Beyond all that, it’s important to remember that the one thing that’s bound to remain relevant in the years to come, regardless of these trends, is the data science mindset. This mindset involves various aspects, such as problem-solving skills, creativity applied in analytics work, meeting deadlines, and collaborating with other data professionals, to name a few. The data science mindset is our attitude towards the data science problems we have to solve. As such, it's something essential and perhaps more relevant than whatever skill is in vogue at any given time.
You can learn more about the data science mindset and other relevant topics in this field through one of my books, titled Data Science Mindset, Methodologies, and Misconceptions. There I explore the various aspects of the field without getting too technical, all while highlighting those skills that make up the data science mindset. I include some soft skills and some hard ones that are still relevant today, even if some of the tools have evolved since then. So, check it out when you have a moment. Cheers!
Natural Language Processing (NLP) is an essential part of data science today. Although its focus is on analyzing text, its benefits go beyond this and cover cases to improve an existing text. In a world where written communication is becoming more prevalent, this is a powerful aspect of the field. In this article, we'll look into all that through a practical and not-too-technical perspective.
Let’s start with what NLP is. NLP is a specialized field on the overlap of data science and A.I., geared towards analyzing text data, mainly text forming complete sentences. NLP aims to understand things like the tone of the text, its sentiment polarity, and its intention. Specialized aspects of NLP focus on understanding the meaning of the text to provide more in-depth insights regarding it. This kind of NLP is under the Natural Language Understanding (NLU) umbrella, and it's a more advanced aspect of NLP. Other advanced aspects of NLP involve creating new text based on a given text or sometimes even just a prompt.
A given text can improve in various ways. Apart from the apparent corrections (e.g., typos and incomplete sentences), it can be made clearer, less wordy, and more elegant. Changes like these involve improving the vocabulary involved as well as the sentence structure. To automate this process, some NLP work is necessary. Additionally, the user's feedback can be incorporated to enhance the text further, mitigating any inaccurate improvements. What's more, additional improvements from a human editor can be incorporated into the NLP model, even if that editor is the original text's creator. In any case, this is a long process that involves considering various factors, such as the audience the text is targeted at, the objective of the text, etc. That's why to improve a text, you need a systematic and sophisticated approach, one that is versatile enough to adapt and evolve. In short, you need an A.I. system designed for this specific task.
All this may seem like a lot of work for just making a piece of text look nicer, perhaps a bit of an overkill. However, considering the effects of this work, it may be an excellent investment. In particular, by providing suitable corrections to the user (who is also the original text's creator), the latter can improve his writing style and mastery of the language. This is particularly the case when the user is not well versed in linguistics and makes many mistakes. So, this simple NLP pipeline, which is also mostly self-sufficient, can improve the user also, all while enabling her to spend her time on other, more challenging tasks. What's more, a good text can help communication among people, effectively making this NLP work an excellent time-saver for everyone involved in this text.
But where can someone find such an NLP system that can improve a given text and the person who wrote it, eventually? Well, Grammarly has you covered in that regard. This company has developed a powerful A.I. system that does just that, all while having an intuitive and easy-to-use interface that integrates well with your web browser. Having used this system myself for over a year now, I can attest to its usefulness and insightful feedback. Check it out when you have the chance. Cheers!
Data analytics is the field of analyzing data and using any insights you've discovered to facilitate an organization's workflow. It has a very hands-on approach to things and focuses on describing a problem accurately and liaising with the stakeholders to drive decisions based on the data analyzed. Data science is akin to that, while it employs the scientific method to go deeper into the data and develop more sophisticated strategies to drive those decisions.
Nowadays, the roles of the data analyst and the data scientist are somewhat mixed since the latter is still relatively new. The fact that its evangelists haven't publicized it accurately makes it even more challenging to understand how it fills a somewhat different niche as a role. A data analyst is more wide-spread as a role and ties to a large variety of tasks, including marketing analytics (e.g., SEO) and business intelligence (BI) work. An organization usually leverages a data scientist in cases with more unstructured data (e.g., text) or data from various forms, making data wrangling a necessary part of the analysis. Also, in scenarios where the objectives are not as clear-cut (e.g., predict the sales of the next quarter), a data scientist is usually preferred. However, it's worth pointing out that many data scientists start their careers as data analysts and that both roles are necessary. After all, they both work with data to produce insights, even if they often go about it differently.
Beyond the differences mentioned previously, another critical difference between the two roles is the models used. Namely, the data analyst is more geared towards describing the data and understanding the problem it represents. The data scientist goes a step further and also makes predictions (through mathematical models, usually based on machine learning), digging deeper into it. As a result, she can put together a complete solution, such as a predictive model accessible through an API. Naturally, she can also create a dashboard, something that is among the data analyst's deliverables.
What's more, although data analysts can tackle all sorts of data, usually when it comes to text and semi-structured data, that's where they draw the line. For this sort of datasets, specialized methods are required, such as Natural Language Processing (NLP), which falls in the data scientist's domain. The use of AI is often essential in problems like this, something that a data scientist is usually required to know, but beyond the job description of a data analyst.
You can learn more about data science and the data scientist's role through a couple of my books. Specifically, the book Data Scientist: The Definite Guide to Becoming a Data Scientist, illustrates the ins and outs of this role and some practical advice as to how you can pursue a career in this field. Additionally, the Data Science Mindset, Methodologies, and Misconceptions book showcases the field overall and its defining aspects as well as the essential techniques used. Both books together offer a birds-eye view of data science and how you can build your career in it. Check them out when you have a chance. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.