The data scientist and the data analyst both deal with data analysis as their primary task, yet those two roles differ enough to warrant an entirely different set of expectations for each. Both share common attributes and skills, however, making them more similar than people think. This similarity allows a relatively more straightforward transition from one role to another, if needed, something not everyone realizes. This article explores this situation's details and makes some suggestions as to how each role can benefit the other.
The two roles are surprisingly similar, in ways going beyond the surface kinship (i.e., data analysis). Data scientists and data analysts deal with all kinds of data (even though text data is not standard among data analysts), often directly from databases. So, they both deal with SQL (or some SQL-like language) to access a database and obtain the data needed for the project at hand. Both kinds of professionals deal with cleaning and formatting the data to some extent, be it in a programming language (e.g., Python or Julia), or some specialized software (e.g., a Spreadsheet program, in the case of data analysts). Also, both data scientists and data analysts deal with visuals and presentations containing these graphics. Finally, both kinds of professionals write reports or some form of documentation for their work and share it with the project's appropriate stakeholders.
Despite the sophistication of our field, we can learn some things from data analysts as data scientists. Particularly the new generation of data scientists, coming out of bootcamps or from a programming background, have a lot to benefit from these professionals. Namely, the data analysts are closer to the business side of things and often have domain knowledge that data scientists don't. After all, data analysts are more versatile as professionals in employability, making them more prone to gathering experience in different domains. Also, data analysts tend to have more developed soft skills, particularly communication, as they have more opportunities to hone them. Learning all that can benefit any data scientist, especially those who are new to the field.
Additionally, data analysts can learn from data science professionals too. Specifically, the value of an in-depth analysis that we do as data scientists are something every analyst can benefit from undoubtedly. In particular, data engineering is the kind of work that adds a lot of value in data science projects (when it's done right) and something we don't see that much in data analytics ones. What's more, predictive modeling (e.g., using modern frameworks, such as machine learning) is found only in data science, yet something a data analyst can apply. Once someone has the right mindset (aka the data science mindset), it's not too difficult to pick up those skills, particularly if they are already versed in data analytics.
If you wish to learn more about the soft skills and business-related aspects of data science, you can check out one of my relatively recent books, Data Scientist Bedside Manner. In this book, my co-author and I look into the organization hiring data scientist, the relevant expectations, and how such a professional can work effectively and efficiently within an organization. So, check it out if you haven't already. Cheers!
The data scientist role is an incredibly important one in the world today. Be it in for-profit organizations or non-profit ones, it has a lot of value to add and aid decision-making. However, it's still unclear what exactly it entails and how someone can become a data scientist starting from a data analytics background.
The data scientist is a tech professional who processes data, especially complex data, in large amounts (aka big data) to derive insights and build data products. This role involves gathering data, cleaning it up, combining it with other relevant data, evaluating the features involved, and building models based on them, usually to predict some variable of interest or solve some complex problem. It also involves creating insightful visuals and presenting your findings to the project stakeholders, with whom you often need to liaise throughout the data science projects. For all this work, you need to use a lot of programming and various data analysis methods, particularly machine learning.
To transition to the data scientist role from the data analyst one, you need to beef up your programming skills and work on your data analysis methodologies. Learning more techniques for pre-processing data (data engineering) is also essential. What's more, you need to familiarize yourself with various methods for depicting data, such as graphs, and how to process the data in this sort of encodings. Dimensionality reduction methods are also vital for assuming the data scientist role, just like various sampling techniques. Furthermore, handling data in different formats (e.g., JSON, XML, and text) is essential, particularly in projects that deal with semi-structured data. Naturally, having some familiarity with NoSQL databases is also very important, as it goes hand-in-hand with this sort of data.
Naturally, all this is the tip of the iceberg when it comes to transitioning into a data scientist from a data analyst. To make sure this transition is solid enough to build a career on top of it, you need to develop other skills and a good understanding of the complex data involved in data science projects. Being able to communicate with other data professionals well and understand them is also very important. Nowadays, you often have to work as part of a data science team, which involves a certain specialization level. So, having such expertise is significant, at least for certain data scientist positions.
You can learn more about this topic by reading my first book on data science, namely the Data Scientist: The Ultimate Guide to Becoming a Data Scientist one. This book covers various topics related to data scientist has a whole section dedicated to similar roles. It is also written in an easy-to-follow way, without too much technical jargon, while it also has a glossary at the end. Interviews with data scientists of various levels help clarify the role's details and how it is on a practical level. So, check it out when you have a moment. Cheers!
A data product is the main deliverable of data science and some data analytics projects. It involves developing a stand-alone piece of software, often with a data model under the hood. Other times, it takes the form of a set of visualizations that depict particular variables of interest or other useful insights. In any case, data products are vital as they constitute an essential part of a data science project and a useful deliverable in a data analytics project (even if it's not always a requirement).
Dashboards are a kind of data product, featuring graphics and an intuitive (albeit minimalist) interface. They sometimes involve some control element that enables the user to change some settings and adjust the related graphics to different operating conditions. This element provides a more dynamic aspect to the dashboard, which augments the innate dynamism they have. The latter stems from the fact that they are usually linked to a dataset that changes over time, as new data becomes available.
The popularity of dashboards illustrates data visualization's value, be it in data science or data analytics. It's hard to imagine a project like this without some visuals, pinpointing important insights and other findings. Additionally, whenever predictive models are involved, specialized visuals for showcasing the models' performance are a must. That's why data visualization as a sub-field of data science and data analytics has grown, especially in the past few years. The development of professional software undertaking such tasks and specialized libraries in various programming languages have contributed to this growth.
Beyond data visualization, however, other subtle aspects of the data science and data analytics fields are essential but less pronounced in the various educational material out there. For example, the communication of insights and using the visuals mentioned earlier in presentations is something every data professional ought to know. This point is particularly important when you need to liaise with non-technical people, whether colleagues or clients. Also, managing a data analytics project can be challenging, especially in the modern Agile-driven workplace. After all, most data analytics projects today are all about teamwork and tight deadlines, and changing requirements. What's more, although a dashboard is a powerful asset in an organization, it needs to be maintained periodically and fed good-quality data. The latter requires additional work and proper data governance, which not everyone involved in this field is usually aware of, unfortunately.
My Data Scientist Bedside Manner book, which I co-authored last year, is an excellent resource for this kind of topic. Although written for data science professionals mainly, it can be useful to all sorts of data analysts and people involved in data-driven projects (e.g., managers). The idea is to bridge the gap between technical and non-technical professionals in an organization and leverage data analytics work effectively. This is an excellent reference book that every data professional can benefit from in the years to come. Cheers!
Just like other fields, data science has evolved over the past few years. One of the most evident aspects of this evolution is that data scientists are found in teams nowadays. Even consultancies are often team-based, enabling them to undertake a whole project flexibly and efficiently. But how do we build a data science team exactly? First, we need to look at the different types of data scientists and explore the different specialization levels such a professional may have.
Nowadays, there are several types of data scientists. The most important of them are the data engineering (delving into low-level tasks, such as ETL and handling any cloud-related operations) and the data modeling expert (usually referred to as just data scientist or machine learning expert when it's more specialized). Additionally, there are the data visualization expert, the data science manager, and the data communicator (a more niche role that's not as widely spread). Of course, depending on the data science area that a data scientist specializes in, there is also the NLP expert, the A.I. expert, etc. So, it's safe to say that the data scientist role is quite diverse these days.
Speaking of specialization, that's a topic on its own that plays a role in data science work. The specialist is the most common scenario, whereby a data scientist is really good at one particular task and fairly mediocre in other tasks not related to that task. On the other hand, a generalist is quite decent in various tasks but not particularly good at any specific task. Such a person may be a good team leader, but wouldn't be ideal for tackling a particularly challenging problem. Beyond these two, there is also the versatilist, who is quite good at one (or more) tasks but also quite decent in other tasks. It's like a combination of a specialist and a generalist, making an excellent asset in a team, especially in data science work.
So, how do we go about building a data science team? The team's specifics always depend on the project at hand, but in general terms, you can build a team as follows. For starters, you need to get a versatilist or experienced generalist as the team leader. This person can help build the team by finding professionals with a similar working style and cultural fit. Having a second generalist or versatilist may also be useful, depending on the size of the team. Additionally, you can have two or three specialists, one of whom would need to be a data engineer. If your team needs to work with clients directly, you may need to consider having a data communicator. Also, if the team's expected outcomes are more geared towards dashboards and graphics, you may need to have a data visualization expert onboard.
Should you wish to learn more about this topic and other organizational aspects of the data science field, you can check out the Data Scientist Bedside Manner book I co-authored last year. This book examines various aspects of data science work, focusing on the non-technical ones and various useful tips as to how you can improve your data science career. Check it out when you have the chance. Cheers!
As you may have noticed, data analytics is always evolving as a field, so it's not surprising to see data science changes from year to year. What was hot and trendy in 2020 may not be as prominent soon and vice versa. That's not to say that you should expect to see drastic changes in 2021, but it's good to adapt your expectations, taking into account the latest trends. In this article, we'll explore all that and see how you can benefit from these insights for your career in data science.
It's no secret that deep learning is gaining even more popularity, particularly in time series analysis. So, RNNs are bound to rise in demand as a skill, particularly if you are involved in the field's forecasting part. Additionally, healthcare seems to be becoming more aligned with this tech, so it is expected that more medicine-related organizations are going to be looking for data scientists to join their ranks. What's more, IoT is expected to incorporate AI, making our work more relevant to infrastructure projects. Moreover, we should be expecting Reinforcement Learning (RL) to grow further as to use cases of it, such as chat-bots, are growing in popularity. Finally, it seems that more and more people are becoming aware of data science and AI's benefits, so it's easier than ever to make the business case for advanced analytics in a company. Simultaneously, the cloud provides a viable solution for the hardware required, something that's bound to stick in the coming years.
Based on the above, it’s reasonable to deduce that the data science specializations more likely to be more relevant this coming year are AI expert, data engineer (particularly the one geared towards machine learning, aka, machine learning engineer), and those data scientists with domain knowledge in healthcare and IoT. Naturally, Natural Language Processing experts are bound to remain in demand, particularly if they possess chat-bot know-how.
Beyond all that, it’s important to remember that the one thing that’s bound to remain relevant in the years to come, regardless of these trends, is the data science mindset. This mindset involves various aspects, such as problem-solving skills, creativity applied in analytics work, meeting deadlines, and collaborating with other data professionals, to name a few. The data science mindset is our attitude towards the data science problems we have to solve. As such, it's something essential and perhaps more relevant than whatever skill is in vogue at any given time.
You can learn more about the data science mindset and other relevant topics in this field through one of my books, titled Data Science Mindset, Methodologies, and Misconceptions. There I explore the various aspects of the field without getting too technical, all while highlighting those skills that make up the data science mindset. I include some soft skills and some hard ones that are still relevant today, even if some of the tools have evolved since then. So, check it out when you have a moment. Cheers!
Natural Language Processing (NLP) is an essential part of data science today. Although its focus is on analyzing text, its benefits go beyond this and cover cases to improve an existing text. In a world where written communication is becoming more prevalent, this is a powerful aspect of the field. In this article, we'll look into all that through a practical and not-too-technical perspective.
Let’s start with what NLP is. NLP is a specialized field on the overlap of data science and A.I., geared towards analyzing text data, mainly text forming complete sentences. NLP aims to understand things like the tone of the text, its sentiment polarity, and its intention. Specialized aspects of NLP focus on understanding the meaning of the text to provide more in-depth insights regarding it. This kind of NLP is under the Natural Language Understanding (NLU) umbrella, and it's a more advanced aspect of NLP. Other advanced aspects of NLP involve creating new text based on a given text or sometimes even just a prompt.
A given text can improve in various ways. Apart from the apparent corrections (e.g., typos and incomplete sentences), it can be made clearer, less wordy, and more elegant. Changes like these involve improving the vocabulary involved as well as the sentence structure. To automate this process, some NLP work is necessary. Additionally, the user's feedback can be incorporated to enhance the text further, mitigating any inaccurate improvements. What's more, additional improvements from a human editor can be incorporated into the NLP model, even if that editor is the original text's creator. In any case, this is a long process that involves considering various factors, such as the audience the text is targeted at, the objective of the text, etc. That's why to improve a text, you need a systematic and sophisticated approach, one that is versatile enough to adapt and evolve. In short, you need an A.I. system designed for this specific task.
All this may seem like a lot of work for just making a piece of text look nicer, perhaps a bit of an overkill. However, considering the effects of this work, it may be an excellent investment. In particular, by providing suitable corrections to the user (who is also the original text's creator), the latter can improve his writing style and mastery of the language. This is particularly the case when the user is not well versed in linguistics and makes many mistakes. So, this simple NLP pipeline, which is also mostly self-sufficient, can improve the user also, all while enabling her to spend her time on other, more challenging tasks. What's more, a good text can help communication among people, effectively making this NLP work an excellent time-saver for everyone involved in this text.
But where can someone find such an NLP system that can improve a given text and the person who wrote it, eventually? Well, Grammarly has you covered in that regard. This company has developed a powerful A.I. system that does just that, all while having an intuitive and easy-to-use interface that integrates well with your web browser. Having used this system myself for over a year now, I can attest to its usefulness and insightful feedback. Check it out when you have the chance. Cheers!
Data analytics is the field of analyzing data and using any insights you've discovered to facilitate an organization's workflow. It has a very hands-on approach to things and focuses on describing a problem accurately and liaising with the stakeholders to drive decisions based on the data analyzed. Data science is akin to that, while it employs the scientific method to go deeper into the data and develop more sophisticated strategies to drive those decisions.
Nowadays, the roles of the data analyst and the data scientist are somewhat mixed since the latter is still relatively new. The fact that its evangelists haven't publicized it accurately makes it even more challenging to understand how it fills a somewhat different niche as a role. A data analyst is more wide-spread as a role and ties to a large variety of tasks, including marketing analytics (e.g., SEO) and business intelligence (BI) work. An organization usually leverages a data scientist in cases with more unstructured data (e.g., text) or data from various forms, making data wrangling a necessary part of the analysis. Also, in scenarios where the objectives are not as clear-cut (e.g., predict the sales of the next quarter), a data scientist is usually preferred. However, it's worth pointing out that many data scientists start their careers as data analysts and that both roles are necessary. After all, they both work with data to produce insights, even if they often go about it differently.
Beyond the differences mentioned previously, another critical difference between the two roles is the models used. Namely, the data analyst is more geared towards describing the data and understanding the problem it represents. The data scientist goes a step further and also makes predictions (through mathematical models, usually based on machine learning), digging deeper into it. As a result, she can put together a complete solution, such as a predictive model accessible through an API. Naturally, she can also create a dashboard, something that is among the data analyst's deliverables.
What's more, although data analysts can tackle all sorts of data, usually when it comes to text and semi-structured data, that's where they draw the line. For this sort of datasets, specialized methods are required, such as Natural Language Processing (NLP), which falls in the data scientist's domain. The use of AI is often essential in problems like this, something that a data scientist is usually required to know, but beyond the job description of a data analyst.
You can learn more about data science and the data scientist's role through a couple of my books. Specifically, the book Data Scientist: The Definite Guide to Becoming a Data Scientist, illustrates the ins and outs of this role and some practical advice as to how you can pursue a career in this field. Additionally, the Data Science Mindset, Methodologies, and Misconceptions book showcases the field overall and its defining aspects as well as the essential techniques used. Both books together offer a birds-eye view of data science and how you can build your career in it. Check them out when you have a chance. Cheers!
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.